Abstract
Purpose :
Medical datasets are imbalanced in general, which negatively affects the performance of deep learning models. In glaucoma studies, glaucoma suspect (GS) data are often excluded for its ambiguity of the disease status. Our purpose is to test the effect of including GS data on the performance of a deep learning model to diagnose glaucoma solely based on 3D optical coherence tomography (OCT) image data.
Methods :
A total of 16,794 OCT volumes (Cirrus, Zeiss, CA) were obtained from our clinical longitudinal glaucoma database, which had 683 healthy (H), 7,793 glaucoma (G), and 8,318 GS volumes. A 3D convolutional neural network (CNN) architecture was trained in two ways. The first CNN was trained only with healthy and glaucoma images (H+G model) by constructing balanced mini-batches during training to handle learning on an imbalanced dataset. The second CNN was trained with all the data, including GS samples (H+G+GS model). A semi-supervised learning (SSL) method was employed to include GS samples in the training procedure. To this end, the H+G model was used for classifying GS samples. Then, the H+G+GS model was trained on the whole dataset. We decreased the contribution of labels from GS samples by a factor of 0.7 in computing the loss function to reflect the uncertainty about their labels. For both models, the dataset was divided into 80% training, 10% validation, and 10% testing. The experiments were repeated five times, and the Mann-Whitney U test was performed to statistically compare metric results.
Results :
The H+G+GS model achieved a mean accuracy of 95.24% [94.98, 95.50] (the range of values in the brackets are 95% confidence intervals), an F1-score of 97.42% [97.29, 97.55], and an AUC of 95.64% [95.27, 96.01], while the H+G model was 94.34% [94.08, 94.60], 96.94% [96.79, 97.09], and 94.38% [93.90, 94.86], respectively. The obtained results showed statistically significant improvements in all metrics. We argue that the gain was probably due to the similarity of GS samples to healthy and glaucomatous eyes (Figure), which is important for applying semi-supervised learning to imbalanced datasets.
Conclusions :
Often excluded GS samples successfully improved the deep learning based glaucoma classification performance. The SSL technique allows the use of GS data, which helps to mitigate learning issues on an inherently imbalanced clinical glaucoma dataset.
This abstract was presented at the 2023 ARVO Annual Meeting, held in New Orleans, LA, April 23-27, 2023.