Investigative Ophthalmology & Visual Science Cover Image for Volume 65, Issue 7
June 2024
Volume 65, Issue 7
Open Access
ARVO Annual Meeting Abstract  |   June 2024
Evaluation of the accuracy of AI-generated clinical summaries from Glaucoma outpatient visits.
Author Affiliations & Notes
  • Yapei Zhang
    Massachusetts Eye and Ear, Boston, Massachusetts, United States
  • Min Shi
    Schepens Eye Research Institute of Massachusetts Eye and Ear, Boston, Massachusetts, United States
  • Daniel L. Liebman
    Massachusetts Eye and Ear, Boston, Massachusetts, United States
  • Laura Barna
    Massachusetts Eye and Ear, Boston, Massachusetts, United States
  • Louis R Pasquale
    Icahn School of Medicine at Mount Sinai, New York, New York, United States
  • Tobias Elze
    Schepens Eye Research Institute of Massachusetts Eye and Ear, Boston, Massachusetts, United States
  • David S Friedman
    Massachusetts Eye and Ear, Boston, Massachusetts, United States
  • Michael V. Boland
    Massachusetts Eye and Ear, Boston, Massachusetts, United States
  • Lucy Q Shen
    Massachusetts Eye and Ear, Boston, Massachusetts, United States
  • Mengyu Wang
    Schepens Eye Research Institute of Massachusetts Eye and Ear, Boston, Massachusetts, United States
  • Footnotes
    Commercial Relationships   Yapei Zhang None; Min Shi None; Daniel Liebman None; Laura Barna None; Louis Pasquale Twenty-Twenty, Character Bio, Code C (Consultant/Contractor); Tobias Elze None; David Friedman AbbVie, Life Biosciences, Thea Pharmaceuticals, Code C (Consultant/Contractor), Genentech, Perivision, Code F (Financial Support); Michael Boland Carl Zeiss Meditec, Topcon Healthcare, Allergan, Janssen, Code C (Consultant/Contractor); Lucy Shen FireCyte Therapeutics Inc, Code C (Consultant/Contractor); Mengyu Wang None
  • Footnotes
    Support  None
Investigative Ophthalmology & Visual Science June 2024, Vol.65, 1641. doi:
  • Views
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Yapei Zhang, Min Shi, Daniel L. Liebman, Laura Barna, Louis R Pasquale, Tobias Elze, David S Friedman, Michael V. Boland, Lucy Q Shen, Mengyu Wang; Evaluation of the accuracy of AI-generated clinical summaries from Glaucoma outpatient visits.. Invest. Ophthalmol. Vis. Sci. 2024;65(7):1641.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

Purpose : To evaluate the accuracy of the large language model (LLM) chatbot Llama 2 (Meta AI, NYC, USA) to summarize glaucoma clinic notes in patient-friendly language.

Methods : A random sample of deidentified clinic notes from unique patients who visited the Glaucoma Service at Massachusetts Eye and Ear from 5/1/2007-9/15/2023 was presented to Llama 2. Llama 2 was asked to provide paragraph and bullet point note summaries in patient-friendly language and answer the following patient-relevant clinical questions: 1. does the patient have glaucoma, and if so what type; 2. is the patient’s glaucoma progressing; 3. summarize the treatment plan; 4. identify changes to the treatment plan; 5. identify any laser or surgical intervention offered, and if so, generate relevant educational material. Llama 2 responses were evaluated against original clinic notes for accuracy and completeness by an ophthalmologist reviewer. Each response was evaluated as “correct” if it is accurate and complete, “partially correct” if it is accurate but incomplete or with equivocal clinical interpretation by AI, or “incorrect” if it contains irrelevant or false data.

Results : A total of 139 clinic notes were presented. 72 notes (52%) were correctly summarized, and 64 notes (46%) were partially correctly summarized in paragraph format. In bullet-point format, 81 notes (58%) were correctly summarized, and 57 notes (41%) were partially correctly summarized. Llama 2 correctly identified the glaucoma diagnosis 80% of the time (111 notes), correctly identified whether the patient’s glaucoma was progressing 84% of the time (117 notes), correctly summarized the treatment plan 83% of the time (115 notes), correctly identified treatment changes 88% of the time (122 notes), and correctly identified the surgical or laser interventions offered and generated accurate educational material 84% of the time (117 notes). The rate of incorrect answers for specific clinical questions ranged between 3-7% (Table 1). Hallucinations were not observed.

Conclusions : While Llama 2’s performance in this study was not accurate enough for clinical adoption, it shows potential for LLMs to accurately summarize clinical information with more focused training. With increasing adoption of open notes, using LLMs to help organize clinical notes in standardized patient-friendly language could someday improve patient experience and promote health literacy and understanding.

This abstract was presented at the 2024 ARVO Annual Meeting, held in Seattle, WA, May 5-9, 2024.

 

×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×