Purchase this article with an account.
Aaron S Coyner, J. Peter Campbell, Jayashree Kalpathy-Cramer, Praveer Singh, Kemal Sonmez, Michael F. Chiang; Retinal Fundus Image Generation in Retinopathy of Prematurity Using Autoregressive Generative Models. Invest. Ophthalmol. Vis. Sci. 2020;61(7):2166.
Download citation file:
© ARVO (1962-2015); The Authors (2016-present)
In retinopathy of prematurity (ROP), studies have shown that disagreement in absolute diagnosis (e.g. plus vs. pre-plus vs. normal) is common, even among experts. However, experts agree closely on relative disease rankings (e.g., Patient 1 is worse than Patient 2). To aid physician diagnosis, we propose a method that can increase or decrease disease severity of patients' images for intra-patient disease ranking.
Retinal fundus images were captured from preterm infants during routine ROP screening examinations as part of the Imaging and Informatics in ROP (i-ROP) study. Multiple experts diagnosed each image as plus, pre-plus, or normal, and a consensus diagnosis was formed. Plus disease images only accounted for ~10% of the dataset, so downsampling was used to prevent overfitting the model to classes with higher representation. The final dataset consisted of 1,989 images. Random image transformations were applied to augment the size of the training dataset.A vector-quantized variational autoencoder (VQ-VAE) was trained to encode images into latent representations and decode (reconstruct) images from said representations. Reconstruction performance was evaluated using mean squared error (MSE). Following, a PixelCNN was trained to sample from the latent space to create new retinal images of eyes with normal, pre-plus, or plus disease.
The VQ-VAE mapped meaningful features of retinal fundus images and their varying disease severities to images 1/48th the size of the original images. The images were reconstructed nearly perfectly from the compressed images (Fig. 1), as evidenced by the similar MSE values on the train (2.46e-4) and test (2.62e-4) datasets. During this process, a latent representation of the entire dataset was formed. A PixelCNN was trained to sample from this space to create new images of varying disease severities.
Patient images can be encoded into latent representations and reconstructed with increased or decreased vascular severity. These images can aid physician diagnosis by allowing for comparison of a patient's fundus image to images of them if conditions were to improve or worsen. Future work will focus on validation as a viable method of diagnosis, improving the realism of generated images, and traversing the latent space to return predicted disease diagnoses.
This is a 2020 ARVO Annual Meeting abstract.
Figure 1: Original images (top) versus their latent space reconstructions (bottom).
This PDF is available to Subscribers Only