Purchase this article with an account.
Travis Redd, J. Peter Campbell, James M Brown, Sang Jin Kim, Susan Ostmo, Robison Vernon Paul Chan, Jennifer Dy, Deniz Erdogmus, Stratis Ioannidis, Jayashree Kalpathy-Cramer, Michael F Chiang; Application of a Quantitative Image Analysis Scale Using Deep Learning for Detection of Clinically Significant ROP. Invest. Ophthalmol. Vis. Sci. 2018;59(9):2782.
Download citation file:
© ARVO (1962-2015); The Authors (2016-present)
Retinopathy of prematurity (ROP) is a disease of preterm infants with significant visual morbidity and inadequate access to screening. Telemedicine using computerized image assessment offers a compelling opportunity to efficiently address this gap. We have previously demonstrated the near-perfect accuracy of a deep learning computer-generated severity score for diagnosing plus disease. Here we assess the clinical utility of this scoring system by evaluating its applicability to all parameters of ROP diagnosis, including zone, stage, and overall disease category.
Clinical examination and fundus photography were performed on at-risk infants from 7 participating centers. A deep learning based system was developed by training on detection of plus disease, generating a quantitative assessment of retinal vascular abnormality (the i-ROP plus score) on a 1-9 scale. Overall ROP disease category was established using a consensus reference standard diagnosis using methods previously published. The area under the receiver operating curve (AUROC), sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) of this score for the detection of clinically significant ROP were then determined.
A total of 5,219 eye examinations from 871 infants were analyzed. 1,100 exams demonstrated type 2 or worse ROP, including 164 with type 1 ROP. The i-ROP plus score had an AUROC of 0.906 for detection of type 2 or worse ROP, and 0.949 for detection of type 1 ROP (Table). A score of 2.3 conferred 85% sensitivity, 81% specificity, 54% PPV, and 95% NPV for type 2 ROP or worse. A score of 4.8 had 88% sensitivity, 88% specificity, 19% PPV, and 99.6% NPV for type 1 ROP. The i-ROP plus score was slightly less effective at detecting stage 3 disease (AUROC=0.864) and zone I disease (AUROC=0.719).
Despite only being trained to recognize plus disease, this system has high accuracy for detecting clinically significant (type 2 or worse) ROP and fair accuracy for detecting stage 3 ROP. This confirms the clinical utility of a deep learning image assessment system for ROP diagnosis, with potential applications for disease screening in resource-limited settings. Future work focusing on training a deep learning algorithm to specifically identify zone and stage may lead to a fully automated system that can diagnose ROP as well as clinical examiners.
This is an abstract that was submitted for the 2018 ARVO Annual Meeting, held in Honolulu, Hawaii, April 29 - May 3, 2018.
This PDF is available to Subscribers Only