Purchase this article with an account.
Parmita Mehta, Aaron Y Lee, Joanne Wen, Michael R Bannit, Philip P Chen, Karine D Bojikian, Christine Petersen, Catherine A Egan, Su-In Lee, Magdalena Balazinska, Ariel Rokem; Automated detection of glaucoma using retinal images with interpretable deep learning. Invest. Ophthalmol. Vis. Sci. 2020;61(7):1150.
Download citation file:
© ARVO (1962-2015); The Authors (2016-present)
Our goal is to develop a multi-modal model to automate glaucoma detection accurately using retinal images.
We selected a study cohort from the UK Biobank dataset: healthy (visual acuity 20/30 or better; s(ubjects)=863, r(etinas)=1193) and glaucoma (no other ophthalmic conditions; s=771, r=1283). The multi-modal model combines multiple deep neural nets (DNN) trained on macular optical coherence tomography (OCT) volumes and color fundus photos (CFP). We also trained three baseline models (BM); BM1 used demographic data (age, gender, ethnicity), BM2 added systemic medical data (cardiovascular, pulmonary), and BM3 added ocular data (IOP, corneal hysteresis, corneal resistance factor). We determined the importance of different features in detecting glaucoma using SHapley Additive exPlanations (SHAP) and integrated gradients. We also evaluated the model on subjects who did not have a diagnosis of glaucoma on the day of imaging, but were later diagnosed (progress-to-glaucoma (PTG); s=55, r=98). Finally, five glaucoma experts rated the test cohort (normal vs. glaucoma, scale 1-5) based on CFP.
Glaucoma experts ratings on CFP (avg area under ROC curve (AUC): 0.82; inter-rater kappa: 0.75) were more accurate than the CFP-based DNN (AUC: 0.74), but the multi-modal model achieved higher accuracy (AUC: 0.96, Fig. 1). The mulit-modal model was also more accurate than BM3 (AUC: 0.92). Age, IOP, BMI, forced vital capacity (FVC), and peak expiratory flow (PEF) were the top features (Fig. 2). The image-based model used information from inferior retina and features of the optic disc and predicted glaucoma incidence with 69.4% accuracy in the PTG group.
A multi-modal model achieved better performance than using OCT or CFP alone, suggesting distinct information in each modality. Interpreting BM3 provided evidence of previously known (BMI, IOP) and novel pulmonary (FVC, PEF) features associated with incidence of glaucoma.
This is a 2020 ARVO Annual Meeting abstract.
FIG 1: ROC for (A) systemic data, (B) image-based and (C) clinician ratings AUC (+/- 95% CI) for (D) base models, image-based models and (F) clinicians, gray line and shaded area denote the AUC and 95% CI for BM1.
FIG 2: SHAP values (SV) for BM3 (A) Top features (B, C) SVs vs features for top 4 features (D) SVs vs features with each point colored by age
This PDF is available to Subscribers Only