Abstract
Purpose :
Deep-learning systems (DLS) may improve diabetic retinopathy (DR) assessment of retinal fundus images (Sayres et al, Ophthalmology 2019). However, previous methods left room for further accuracy increase; and grading tended to be slower with assistance. Heatmap methods used previously may also have limited explanatory power for clinicians.
We ask: What is the upper bound on performance benefit from a DR assistant, if the DLS output is as accurate and interpretable as possible? And can a localization-based assistant provide clearer explanations to clinicians?
Methods :
We ran a “Wizard-of-Oz” style study in which assistive overlays were edited for optimal accuracy, and presented to clinicians as automatically generated output. We trained a segmentation-based DLS on 13,647 hand-segmented images to localize DR-relevant pathologies such as hemorrhages and neovascularization. The trained DLS was applied to an evaluation set of 400 fundus images, enriched for cases with DR. DLS output was manually edited by an expert ophthalmologist to reflect the underlying pathology as accurately as possible.
Four readers (2 retina specialists, 2 optometrists) read each case in a multi-reader, multi-case study with full crossover. Readers assessed DR gradability and severity for each case. Each batch was read either assisted by the DLS or unassisted; when assisted, the lesion-localization overlay could be toggled on/off as needed. Readers read each case in each arm, with a one-month washout in between.
Results :
DR grading accuracy, evaluated against an adjudicated reference standard, increased substantially with assistance. For moderate or worse DR, specificity increased from 92.4% unassisted to 96.6% assisted (p = 0.01, Obuchowski-Rockette analysis), while sensitivity remained high, from 92.4% unassisted to 95.6% assisted (p = 0.14). Cohen’s quadratically weighted kappa for the 5-point DR grade increased significantly from 90.4% to 96.0%. Mean grading time decreased overall with assistance, from 99.6 sec to 87.8 sec (p = 0.002, T test).
Conclusions :
Manually annotated, lesion-based localization assistance can produce significant improvements in DR grading accuracy and grading time. Further research should determine whether real-world systems can be developed with sufficiently high localization accuracy to produce the performance benefits seen in this study.
This is a 2020 ARVO Annual Meeting abstract.