June 2015
Volume 56, Issue 7
ARVO Annual Meeting Abstract  |   June 2015
Rapid grading of fundus photographs for diabetic retinopathy using crowdsourcing: External Validation
Author Affiliations & Notes
  • Christopher J Brady
    Wilmer Eye Institute, Johns Hopkins University, School of Medicine, Baltimore, MD
  • Andrea C Villanti
    The Schroeder Institute, Legacy, Washington, DC
  • Jennifer L Pearson
    The Schroeder Institute, Legacy, Washington, DC
  • Thomas R Kirchner
    The Schroeder Institute, Legacy, Washington, DC
  • Chirag Shah
    Retina Division, Ophthalmic Consultants of Boston, Boston, MA
  • Omesh P Gupta
    Mid Atlantic Retina, Wills Eye Hospital, Philadelphia, PA
  • Footnotes
    Commercial Relationships Christopher Brady, None; Andrea Villanti, None; Jennifer Pearson, None; Thomas Kirchner, None; Chirag Shah, None; Omesh Gupta, None
  • Footnotes
    Support None
Investigative Ophthalmology & Visual Science June 2015, Vol.56, 5253. doi:
  • Views
  • Share
  • Tools
    • Alerts
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Christopher J Brady, Andrea C Villanti, Jennifer L Pearson, Thomas R Kirchner, Chirag Shah, Omesh P Gupta; Rapid grading of fundus photographs for diabetic retinopathy using crowdsourcing: External Validation. Invest. Ophthalmol. Vis. Sci. 2015;56(7 ):5253.

      Download citation file:

      © ARVO (1962-2015); The Authors (2016-present)

  • Supplements

Purpose: To refine and externally validate a novel method for fundus photograph grading.

Methods: A crowd-sourcing interface for fundus photo classification developed for Amazon Mechanical Turk (AMT), including annotated training images was refined based on user feedback. In Phase 1, nineteen expert-graded images were posted for categorization into 4 severity categories by AMT workers (Turkers), with 10 repetitions per photo. Three sequential batches were posted with iterative refinements to the interface. In Phase 2, 400 images from the MESSIDOR public datastet of non-mydriatic fundus photos were posted using the refined interface from Phase 1, asking Turkers to categorize the images as normal or abnormal. In Phase 3, iterative improvements were made to the interface in an attempt to further refine accuracy using the Messidor dataset. The main outcome measure was proportion of images with matching consensus Turker and expert/gold-standard score.

Results: Across 190 grading instances in Phase I, Turker consensus accuracy in 4-category grading increased to a maximum of 52.6% from 26.3%. Turker accuracy at categorizing the images as normal vs. abnormal increased to 100% from a baseline of 89.5%. Throughout, 100% sensitivity for normal vs. abnormal was maintained. Maximum specificity was 85.7%. Across 4000 grading instances in Phase 2, Turkers had an overall accuracy of 68.5%. Excluding the first two MESSIDOR disease categories, level 1 (<5 microaneurysms (MA)) and level 2 (<15 MA or <5 hemorrhages), accuracy increased to 80.9% with a sensitivity of 92.4% and specificity of 78.0%. Four out of 53 cases (7.5%) of level 3 (≥15 MA or ≥5 hemorrhages or neovascularization) retinopathy were missed.

Conclusions: With minimal training, the AMT workforce can rapidly and correctly categorize fundus photos of diabetic patients as normal or abnormal when a moderate to severe amount of disease is present. Further refinement is required for Turkers to identify subtle disease, and correctly categorize the level of disease. That Turker accuracy was preserved using a different dataset than that with which the interface was developed is a critical validation. Images were interpreted for a total cost of $1.10 per eye. Crowdsourcing may offer a novel and inexpensive means to reduce the skilled grader burden and increase screening for diabetic retinopathy in some settings.


This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.