Abstract
Purpose :
Training of modern analytical methods requires large amounts of clinical data which typically are shared among multiple centers. Conventional data sharing involves copying and transferring original data, potentially causing issues in privacy and ethics. This study investigates a data sharing technique in which pseudo data, equivalent to the original data, are generated by a data model, so that the sharing of the data can be carried out by sharing the model. The method is validated on visual field (VF) measurements.
Methods :
The United Kingdom Glaucoma Treatment Study collected 13878 Humphrey Field Analyzer 24-2 VF measurements. Wasserstein generative adversarial networks (W-GAN) were trained to learn a generative model to generate the pseudo data from a multidimensional independent normal distribution and an adversarial model to distinguish the pseudo and original data. The generative model is a 3-layered neural network with 8-dimension isotropic normally distributed input and 514 hidden units. The adversarial network takes a VF as input and predicts if it is pseudo or original. It is constructed as a 4-layered neural network with 2 hidden layers of 514 units. W-GAN trains the two networks such that the adversarial network cannot distinguish the pseudo and original data. With the trained generative model, a pseudo dataset was created by randomly generating 13878 samples. The spatial correlation among test locations of the pseudo and original VFs was compared to validate if the generative model encodes correct structure-function relationship. The original-pseudo VF pair is formed by matching the minimum mean absolute error (MAE).
Results :
The median difference (95% confidence interval; CI) in the correlation among test locations (Figure 1) for the pseudo and original data is 0.0284 ([0.0003, 0.0938]). The pseudo VFs encode the same structure-function relationship as that in the original VFs. The mean absolute error (95% CI) between the original-pseudo VF pair (Figure 2a) is 1.55dB ([0.96dB, 3.98dB]). Most VFs in the original dataset have a paired VF in the pseudo dataset with MAE less than 4dB (Figure 2b).
Conclusions :
The trained generative model can not only capture the spatial correlationin of the VFs, but also learn the distribution of the original dataset. The original data can be retrieved by sampling pseudo data from the generative model. The model may also be used to sample longitudinal VFs for the analysis of progression.
This is an abstract that was submitted for the 2018 ARVO Annual Meeting, held in Honolulu, Hawaii, April 29 - May 3, 2018.