Purchase this article with an account.
J. Peter Campbell, James M Brown, Susan Ostmo, R.V. Paul Chan, Jennifer Dy, Deniz Erdogmus, Stratis Ioannidis, Jayashree Kalpathy-Cramer, Michael F Chiang; Artificial intelligence in retinopathy of prematurity: clinical validation of a fully automated deep learning system (i-ROP DL) for plus disease diagnosis. Invest. Ophthalmol. Vis. Sci. 2018;59(9):3936.
Download citation file:
© ARVO (1962-2015); The Authors (2016-present)
We compare the performance of i-ROP DL, a fully automated deep learning (DL) system for plus disease diagnosis, with the performance of expert ROP clinicians.
Using a deep convolutional neural network (Deep-ROP) described elsewhere (Brown et al, ARVO 2018), as part of the ongoing “Imaging and Informatics in ROP” (i-ROP) study, we developed a fully automated open source deep learning system (i-ROP DL) for plus disease diagnosis. We compared the performance of i-ROP DL on an independent test set of 100 images that were previously graded, and ranked in order of disease severity, by 8 international ROP experts using methods previously published. The diagnostic performance of the DL algorithm was compared to experts using weighted kappa statistics. A continuous score was created (from 1 to 9) using the DL output and compared to the expert ordered ranking of disease severity.
Figure 1 shows the weighted kappa statistics for each of the 8 graders, the RSD, the consensus diagnosis, and the i-ROP DL diagnosis. The weighted kappa score for i-ROP DL compared to the RSD was 0.92, better than 6 of the 8 experts. In the test set, i-ROP DL accurately diagnosed 91/100 (91%) images correctly, whereas 8 experts had an average accuracy of 82% (range 77%-94%, previously published). Figure 2 displays the i-ROP DL derived severity score for each of the 100 images compared to order of disease severity, as determined by experts using pairwise comparisons.
The i-ROP DL system classified plus disease as well as international ROP experts. Incorporation of this technology into routine ROP care could provide an objective method of documenting and monitoring disease severity in ROP.
This is an abstract that was submitted for the 2018 ARVO Annual Meeting, held in Honolulu, Hawaii, April 29 - May 3, 2018.
Figure 1. Weighted kappa statistics between 8 individual expert readers, the consensus of 8 readers, the reference standard diagnosis (RSD), and the i-ROP DL program.
Figure 2. i-ROP score as a function of disease severity among 100 ranked images in the test set.
This PDF is available to Subscribers Only