June 2021
Volume 62, Issue 8
Open Access
ARVO Annual Meeting Abstract  |   June 2021
Building a labeled dataset for training an Artificial Intelligence (AI) algorithm for glaucoma screening
Author Affiliations & Notes
  • Hans G Lemij
    Glaucoma Service, The Rotterdam Eye Hospital, Rotterdam, Netherlands
  • Helga Kliffen
    Glaucoma Service, The Rotterdam Eye Hospital, Rotterdam, Netherlands
  • Koen Vermeer
    Rotterdams Oogheelkundig Instituut, Rotterdam, Zuid Holland, Netherlands
  • Footnotes
    Commercial Relationships   Hans Lemij, None; Helga Kliffen, None; Koen Vermeer, None
  • Footnotes
    Support  None
Investigative Ophthalmology & Visual Science June 2021, Vol.62, 1019. doi:
  • Views
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Hans G Lemij, Helga Kliffen, Koen Vermeer; Building a labeled dataset for training an Artificial Intelligence (AI) algorithm for glaucoma screening. Invest. Ophthalmol. Vis. Sci. 2021;62(8):1019.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

Purpose : Too many people in the world are visually impaired by glaucoma, largely because the disease is detected too late. Aim: to build a labeled dataset for training an AI algorithm for automated glaucoma screening by fundus photography.

Methods : Color fundus photographs of over 110,000 eyes were obtained from EyePACS, California, USA, from a population screening programme for diabetic retinopathy. A tool was developed specifically for efficient grading. Thirty carefully selected graders (ophthalmologists and optometrists) graded the images. To qualify, they had to pass the EODAT1stereoscopic optic disc assessment with at least 85% accuracy and 92% specificity. Of 87 candidates, 30 passed. Each image of the EyePACS set was then scored by varying pairs of two randomly matched graders as ‘Referable glaucoma’, ‘No referable glaucoma’ or ‘Ungradable’. In case of disagreement, a glaucoma specialist (‘third grader’) made the final grading. ‘Referable glaucoma’ was scored only if visual field damage was expected.

1 Reus NJ, Lemij HG et al. Clinical Assessment of Stereoscopic Optic Disc Photographs for Glaucoma: The European Optic Disc Assessment Trial. Ophthalmology. 2009 Dec 31. [Epub ahead of print] Ophthalmology. 2010 Apr;117(4):717-23.

Results : Approximately 14,000 eyes were graded per week. For the first two weeks, the average time per grading was 21.6 sec, but the grading of “Referable glaucoma’ took longer, on average 50.3 sec. Approximately 20 % was scored by a third grader. The overall sensitivity and specificity were, initially, 84% and 90%, respectively. The reference standard for these was the final label, i.e., the consensus between the first two graders, or, in case of any disagreement between the two, the label of the third grader. Measures were then taken to improve these scores for the consecutive gradings. Six graders were disqualified for further participation. Both individualized and general feedback was provided to each of the remaining 24 graders. In addition, online meetings were scheduled to discuss difficult cases. Graders with high sensitivity scores were matched with those who showed high specificities. With these measures, the individual scores improved and the overall quality of the labels as well.

Conclusions : Building a labeled dataset is a huge, but quite feasible task, which calls for careful planning, execution, monitoring and refinement.

This is a 2021 ARVO Annual Meeting abstract.

×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×