June 2022
Volume 63, Issue 7
Open Access
ARVO Annual Meeting Abstract  |   June 2022
Assessment of AI Algorithms for Diabetic Retinopathy Classification Using Model Cards
Author Affiliations & Notes
  • Dinah Chen
    Ophthalmology, NYU Langone Health, New York, New York, United States
  • Samuel Lee
    Grossman School of Medicine, New York University, New York, New York, United States
  • Cansu Elgin
    Ophthalmology, Sisli Etfal Research and Training Hospital, Istanbul, Turkey
  • Raymond Zhou
    Vanderbilt University School of Medicine, Nashville, Tennessee, United States
  • Alexi Geevarghese
    Ophthalmology, NYU Langone Health, New York, New York, United States
  • Lama A Al-Aswad
    Ophthalmology, NYU Langone Health, New York, New York, United States
  • Footnotes
    Commercial Relationships   Dinah Chen Clover Therapeutics, Code C (Consultant/Contractor); Samuel Lee None; Cansu Elgin None; Raymond Zhou None; Alexi Geevarghese None; Lama Al-Aswad Aerie Pharmaceuticals, Code C (Consultant/Contractor), Zeiss, Code C (Consultant/Contractor), Topcon, Code C (Consultant/Contractor), GlobeChek, Code I (Personal Financial Interest), Save Vision Foundation, Code R (Recipient), New World Medical, Code R (Recipient), AI Optics, Code S (non-remunerative)
  • Footnotes
    Support  Research to prevent blindness grant
Investigative Ophthalmology & Visual Science June 2022, Vol.63, 3005 – F0275. doi:
  • Views
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Dinah Chen, Samuel Lee, Cansu Elgin, Raymond Zhou, Alexi Geevarghese, Lama A Al-Aswad; Assessment of AI Algorithms for Diabetic Retinopathy Classification Using Model Cards. Invest. Ophthalmol. Vis. Sci. 2022;63(7):3005 – F0275.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

Purpose : In the last decade there have been vast advancements in artificial intelligence (AI) in ophthalmology. However, reporting in AI literature is highly unstandardized and algorithmic fairness remains challenging to assess. In this preliminary study, we evaluate 59 studies on the development, validation, and trialing of AI tools for referable diabetic retinopathy (RDR) diagnosis, on measures of transparency. To do so, we employ a scoring system using an AI model card, a framework for benchmarked assessment of algorithmic fairness.

Methods : We identified 59 studies on AI algorithms for RDR diagnosis using fundus photos. 17 studies reported on algorithm training and internal validation, 26 studies on external validation, and 16 studies on prospective, clinical validation of RDR algorithms. We apply our model card scoring system to these studies to broadly assess algorithm transparency. Model card scored elements include basic model details (i.e. model version), elements of intended use, input/output definitions and architecture, training and evaluation dataset details (i.e, source, size, demographics), performance measures (AUC, sensitivity (SE) and specificity (SP)), and ethical factors relating to algorithm bias.

Results : Out of a total possible score of 22, clinical validation studies scored an average of 16.7 (range 13- 20), representing a moderate level of transparency. Only 1 clinical validation study defined a clear scope of use and only 3/16 studies reported data on race. While nearly all reported sensitivity and specificity, only 9/16 studies reported AUC and only 4/16 reported imageability. Clinical validation studies were conducted on an average of 1094 patients, ranging from 143-4381 patients. Average AUC, SE and SP was 0.9305, 90.8%, and 85.8% respectively. Similarly, reporting on training and external validation varied widely. 6/43 studies reported race data. Training datasets ranged from 89 to 466,247 images, averaging 52,035 images. Average AUC, SE, and SP was 0.960, 90.9% and 89.46% for training algorithms respectively. Average AUC, SE and SP of externally validated algorithms was 0.942, 92.4%, and 86.17% respectively.

Conclusions : Our results demonstrate a high level of variability in reporting of AI algorithms for RDR, with many clinical validation studies demonstrating moderate or poor levels of transparency. Model cards may help in promoting fairness and standardization of AI reporting.

This abstract was presented at the 2022 ARVO Annual Meeting, held in Denver, CO, May 1-4, 2022, and virtually.

 

×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×