Purchase this article with an account.
Rory Sayres, Ankur Taly, Ehsan Rahimy, Katy Blumer, David Coz, Naama Hammel, Jonathan Krause, Arunachalam Narayanaswamy, Zahra Rastegar, Derek Wu, Shawn Xu, Lily Peng, Dale Webster; Assisted reads for diabetic retinopathy using a deep learning algorithm and integrated gradient explanation. Invest. Ophthalmol. Vis. Sci. 2018;59(9):1227.
Download citation file:
© ARVO (1962-2015); The Authors (2016-present)
Recent machine learning methods have produced models which can grade retinal fundus images for diabetic retinopathy (DR) with doctor-level accuracy. The impact of these models on DR diagnosis in assisted-read settings has not yet been measured. We investigated whether surfacing model predictions and explanatory saliency maps ("masks") to doctors improved DR grading accuracy, speed, and confidence.
We recruited 9 ophthalmologists to read 1,806 cases each for DR severity. Readers graded 45° fundus images centered around the macula. The image sample was representative of the diabetic population, and was adjudicated by 3 retina specialists (1 also a reader) for ground truth grades.Doctors read each image in one of 3 conditions: Unassisted, Grades Only, or Grades+Masks. The Grades Only condition surfaced a histogram of scores from a deep learning model trained to detect DR. The Grades+Masks condition also showed a mask generated using the integrated gradients explanation method, which indicated pixels contributing to the highest-scoring DR grade. Experimental conditions were counterbalanced across readers.
Readers graded DR more accurately with model assistance than without (p < 0.01, logistic regressions). Accuracy lift was driven by cases with some degree of DR. Doctors also reported significantly higher confidence in their grade with assistance. Read times increased overall with assistance.Both Grades-Only and Grades+Masks conditions showed a similar pattern of improvements over Unassisted reads. For most cases, Grades Only was as effective as Grades+Masks. Masks were inapplicable in the no DR cases. Masks provided additional benefit over grades alone in cases with: some DR and low model certainty; low image quality; and PDR with features that were possible to miss (e.g. PRP scars).Model assistance also shifted the operating points of doctors: They became much more sensitive to true positives, and either neutral or slightly less specific.
Deep learning models can improve the accuracy of, and confidence in, DR diagnosis in an assisted read setting. Explanation masks can further improve diagnosis in some, but not all, cases.
This is an abstract that was submitted for the 2018 ARVO Annual Meeting, held in Honolulu, Hawaii, April 29 - May 3, 2018.
Experimental conditions. Left: Unassisted; center: Grades Only; right: Grades+Masks. Graders could toggle assistance on/off.
Summary metrics for DR grading accuracy, confidence, and time on task, ± 95% confidence intervals.
This PDF is available to Subscribers Only