Investigative Ophthalmology & Visual Science Cover Image for Volume 65, Issue 7
June 2024
Volume 65, Issue 7
Open Access
ARVO Annual Meeting Abstract  |   June 2024
Assessing the Accuracy of ChatGPT in Diagnosing Ocular Disease
Author Affiliations & Notes
  • Belinda Ikpoh
    Ophthalmology, Montefiore Medical Center, New York, New York, United States
    Albert Einstein College of Medicine, Bronx, New York, United States
  • Jee-young Moon
    Albert Einstein College of Medicine, Bronx, New York, United States
  • Viral Juthani
    Ophthalmology, Montefiore Medical Center, New York, New York, United States
    Albert Einstein College of Medicine, Bronx, New York, United States
  • Footnotes
    Commercial Relationships   Belinda Ikpoh None; Jee-young Moon None; Viral Juthani None
  • Footnotes
    Support  None
Investigative Ophthalmology & Visual Science June 2024, Vol.65, 339. doi:
  • Views
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Belinda Ikpoh, Jee-young Moon, Viral Juthani; Assessing the Accuracy of ChatGPT in Diagnosing Ocular Disease. Invest. Ophthalmol. Vis. Sci. 2024;65(7):339.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

Purpose : Chat Generative Pre-trained Transformer (ChatGPT) is an artificial intelligence chatbot designed to solve problems while storing knowledge. As patients are more commonly using online resources to investigate their symptoms, we sought to analyze how accurate ChatGPT is in answering board-style ophthalmology questions and in diagnosing acute ocular problems.

Methods : The American Academy of Ophthalmology question bank was used to provide 15 questions from 11 subspecialty categories. Feedback was given for incorrect answers, and the question was re-asked. Additionally, charts from the electronic medical record (EMR) were accessed during a one-month period for patients presenting with a new, acute ocular problem at the triage clinic. The chief complaint and elements of the ophthalmic exam were inputted to ChatGPT to generate a response. ChatGPT’s primary and differential diagnoses were checked against the ophthalmologist’s diagnosis, and the amount of incorrect versus correct responses were calculated for each patient encounter. Data was analyzed using the Fisher’s exact test and pairwise comparison.

Results : The overall accuracy rate of ChatGPT’s first-attempt answers to board-style questions was 59% correct (95% CI = 0.51-0.67; p<0.001), which is significantly higher than chance. Accuracy rates varied from 40% (optics) to 73% (uveitis), with all but optics significantly more accurate than chance (p = 0.229). The overall accuracy of second-attempt answers was 24% correct (95% CI 0.14-0.36). There is an inverse correlation (Pearson correlation coefficient = -0.93) in accuracies between the first and second attempts. Combining all attempts, the average accuracy was 69% (95% CI=0.61-0.76). Regarding EMR diagnoses, the ChatGPT diagnosis matched the diagnosis given by the physician in clinic 67% of the time. The ChatGPT differential included the correct diagnosis 88% of the time. There was no difference in accuracy between specialties (neuro, plastics, cornea, retina, uveitis) using a Fisher’s exact test (all p>0.79).

Conclusions : ChatGPT answers a slight majority of board-style questions correctly, but does not improve with feedback. ChatGPT is likely to include the correct diagnosis within its differential for clinical patients EMR, which is likely dependent on relevant history and exam elements being provided to ChatGPT.

This abstract was presented at the 2024 ARVO Annual Meeting, held in Seattle, WA, May 5-9, 2024.

×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×