Purchase this article with an account.
Rory Sayres, Lin Yang, Abigail Huang, Shawn Xu, Siva Balasubramanian, Ilana Traynis, Anna Iurchenko, Sonali Verma, Daniel Golden; How much benefit can a deep learning system provide for diabetic retinopathy grading?: A Wizard-of-Oz study. Invest. Ophthalmol. Vis. Sci. 2020;61(7):3315.
Download citation file:
© ARVO (1962-2015); The Authors (2016-present)
Deep-learning systems (DLS) may improve diabetic retinopathy (DR) assessment of retinal fundus images (Sayres et al, Ophthalmology 2019). However, previous methods left room for further accuracy increase; and grading tended to be slower with assistance. Heatmap methods used previously may also have limited explanatory power for clinicians.We ask: What is the upper bound on performance benefit from a DR assistant, if the DLS output is as accurate and interpretable as possible? And can a localization-based assistant provide clearer explanations to clinicians?
We ran a “Wizard-of-Oz” style study in which assistive overlays were edited for optimal accuracy, and presented to clinicians as automatically generated output. We trained a segmentation-based DLS on 13,647 hand-segmented images to localize DR-relevant pathologies such as hemorrhages and neovascularization. The trained DLS was applied to an evaluation set of 400 fundus images, enriched for cases with DR. DLS output was manually edited by an expert ophthalmologist to reflect the underlying pathology as accurately as possible.Four readers (2 retina specialists, 2 optometrists) read each case in a multi-reader, multi-case study with full crossover. Readers assessed DR gradability and severity for each case. Each batch was read either assisted by the DLS or unassisted; when assisted, the lesion-localization overlay could be toggled on/off as needed. Readers read each case in each arm, with a one-month washout in between.
DR grading accuracy, evaluated against an adjudicated reference standard, increased substantially with assistance. For moderate or worse DR, specificity increased from 92.4% unassisted to 96.6% assisted (p = 0.01, Obuchowski-Rockette analysis), while sensitivity remained high, from 92.4% unassisted to 95.6% assisted (p = 0.14). Cohen’s quadratically weighted kappa for the 5-point DR grade increased significantly from 90.4% to 96.0%. Mean grading time decreased overall with assistance, from 99.6 sec to 87.8 sec (p = 0.002, T test).
Manually annotated, lesion-based localization assistance can produce significant improvements in DR grading accuracy and grading time. Further research should determine whether real-world systems can be developed with sufficiently high localization accuracy to produce the performance benefits seen in this study.
This is a 2020 ARVO Annual Meeting abstract.
Illustration of the lesion localization assistant.
This PDF is available to Subscribers Only