Investigative Ophthalmology & Visual Science Cover Image for Volume 65, Issue 7
June 2024
Volume 65, Issue 7
Open Access
ARVO Annual Meeting Abstract  |   June 2024
Do AI models provide medically accurate guidance for glaucoma patients?
Author Affiliations & Notes
  • Milan Del Buono
    Bioengineering, University of California Berkeley, Berkeley, California, United States
  • Gloria Wu
    Ophthalmology, University of California San Francisco School of Medicine, San Francisco, California, United States
  • David A Lee
    The University of Texas Health Science Center at Houston John P and Katherine G McGovern Medical School, Houston, Texas, United States
  • Adrial Wong
    University of California Davis, Davis, California, United States
  • Weichen Zhao
    University of California Davis, Davis, California, United States
  • Footnotes
    Commercial Relationships   Milan Del Buono None; Gloria Wu None; David Lee None; Adrial Wong None; Weichen Zhao None
  • Footnotes
    Support  None
Investigative Ophthalmology & Visual Science June 2024, Vol.65, 1642. doi:
  • Views
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Milan Del Buono, Gloria Wu, David A Lee, Adrial Wong, Weichen Zhao; Do AI models provide medically accurate guidance for glaucoma patients?. Invest. Ophthalmol. Vis. Sci. 2024;65(7):1642.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

Purpose : 3 million Americans have glaucoma, but only half are aware of their disease. Online tools, such as WebMD, may have led people to seek care after learning that their symptoms may be abnormal. ChatGPT (Microsoft/OpenAI) is an AI Large Language Model (LLM) that generates heavy internet traffic. Some patients may use ChatGPT as an online resource to check their symptoms or learn more about their possible disease. We attempt to semantically assess LLM outputs and compare them to the AAO's resources for glaucoma patients to determine if AI models are accurate enough to be used as education.

Methods : Headings from the AAO's page on glaucoma were created into questons. OpenAi's gpt-3.5-Turbo and gpt-4.0 APIs were queried for a total of 15 questions:

What is the main cause of glaucoma?
What is open-angle glaucoma (OAG)?
What is Angle-closure glaucoma (ACG)?
What are symptoms of OAG?
What are symptoms of ACG?
Do glaucoma suspects have symptoms?
What are symptoms of pigment dispersion syndrome?
Who is at risk for glaucoma?
How is glaucoma diagnosed?
Can glaucoma be stopped?
What medications are for glaucoma?
What is a trabeculoplasty?
What is an iridotomy?
What are glaucoma drainage devices?
What is a patient's role in glaucoma treatment?

Responses were compared with a semantic text comparison model from HuggingFace ("MiniLM-L6-v2"), and the soft cosine distance between the LLM responses and the AAO's answers was taken. Dale-Chall grade levels were used to assess the readability of each answer.

Results : The soft cosine distance between LLMs and the AAO's answers varied between 0.58 and 0.95. There was no significant difference between GPT3.5 and 4.0 overall, however, GPT4.0 outperformed GPT3.5 on every question relating to symptoms. The AAO's website had a Dale-Chall score of 8.85, correlating to grades 11-12, while ChatGPT 3.5 and 4.0 scored 10.37 and 10.59, respectively (college graduate).

Conclusions : LLMs can produce reasonably similar responses to existing resources developed by health professionals, reflected by high soft cosine scores (0=fully opposite, 1=identical text). Thus, LLMs present an exciting opportunity for doctors to educate their patients, perhaps by training a custom GPT LLM on each of the major areas they treat. However, the high reading grade levels present an issue when helping patients who may not comprehend English well.

This abstract was presented at the 2024 ARVO Annual Meeting, held in Seattle, WA, May 5-9, 2024.

 

Soft cosine scores for GPT3.5 and 4.0 are compared for each question.

Soft cosine scores for GPT3.5 and 4.0 are compared for each question.

×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×