Purchase this article with an account.
Lu Yang, Carter Dunn, Abigail E Huang, Naama Hammel, Ilana Traynis, Monica Gandhi, Jonathan Krause, Sonia Phene; Performance of Deep Learning Glaucoma Suspect Models Compared to Various Reference Standards. Invest. Ophthalmol. Vis. Sci. 2020;61(7):4538.
Download citation file:
© ARVO (1962-2015); The Authors (2016-present)
To train deep learning models for identifying glaucoma suspect from color fundus photos and compare their performance against various reference standards.
We trained two deep learning models on fundus photos to predict the presence of any referable anatomical abnormalities (ANA) indicative of glaucoma. The first model (“3ANA”) outputs whether any of the following three ANAs is present in the fundus photo: vertical cup to disc ratio > 0.7, neuroretinal rim notch, or retinal nerve fiber layer defect. The second model (“4ANA”) assesses for any of the three ANAs, or disc hemorrhage. We then measured the performance of 3ANA and 4ANA on two data sets with different reference standards. The primary validation set (n=1119) uses a reference standard based on three glaucoma specialists’ assessment of a single fundus photo. The secondary validation set consists of 346 eyes from 346 patients from an independent institution, and uses a reference standard based on a complete clinical glaucoma workup, determined as glaucoma, glaucoma suspect, or not glaucoma.
When evaluated on a reference standard based on three glaucoma specialists’ assessment of a single fundus photo, 3ANA and 4ANA achieve AUCs of 0.890 and 0.861, respectively. When compared against a reference standard based on a full glaucoma workup, 3ANA and 4ANA achieve AUCs of 0.778 and 0.782, respectively.
The models developed to detect the presence of glaucoma-related ANAs are fairly well correlated with glaucoma specialists’ assessment on fundus photo alone. Compared to performance on the primary validation set, the apparent performance decrease when evaluating on a reference standard based on the full glaucoma workup may be due to differences in patient populations, or in the breadth of clinical data used to arrive at the diagnosis.
This is a 2020 ARVO Annual Meeting abstract.
Figure 1. Receiver operating characteristic curve (ROC) of the 3ANA and 4ANA models on the primary validation dataset (n=1119), against a reference standard based on glaucoma specialists’ assessment of a single fundus photograph.
Receiver operating characteristic curve (ROC) of the 3ANA and 4ANA models on the secondary validation dataset (n=346), against a reference standard based on a full glaucoma workup.
This PDF is available to Subscribers Only