Investigative Ophthalmology & Visual Science Cover Image for Volume 65, Issue 7
June 2024
Volume 65, Issue 7
Open Access
ARVO Annual Meeting Abstract  |   June 2024
Appropriateness and Readability of ChatGPT-3.5 Responses to Common Patient Questions on Age-Related Macular Degeneration
Author Affiliations & Notes
  • Nayanika Challa
    Illinois Eye and Ear Infirmary, Chicago, Illinois, United States
  • Nina Luskey
    Illinois Eye and Ear Infirmary, Chicago, Illinois, United States
  • Daniel Wang
    Retina Group of Washington, Washington DC, Washington, United States
  • Footnotes
    Commercial Relationships   Nayanika Challa None; Nina Luskey None; Daniel Wang None
  • Footnotes
    Support  None
Investigative Ophthalmology & Visual Science June 2024, Vol.65, OD78. doi:
  • Views
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Nayanika Challa, Nina Luskey, Daniel Wang; Appropriateness and Readability of ChatGPT-3.5 Responses to Common Patient Questions on Age-Related Macular Degeneration. Invest. Ophthalmol. Vis. Sci. 2024;65(7):OD78.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

Purpose : ChatGPT, a large language model-based chatbot, is an ever-evolving source of medical information, garnering wide interest and usage in recent years. Consequently, it is necessary to evaluate the correctness and readability of this information. Age-related macular degeneration (AMD) is a leading cause of severe vision loss in the United States. Our study aims to assess the appropriateness and readability of responses generated by ChatGPT-3.5 in regards to AMD.

Methods : A retrospective cross-sectional study was performed; a list of 40 frequently asked questions about AMD was curated, categorized into general knowledge such as prevalence and visual impact (14), diagnostic and screening guidelines (10), and treatment options (16). Each question was inputted into the ChatGPT-3.5 platform. Two independent ophthalmologists graded the appropriateness of each response. Readability was assessed with the Flesch-Kincaid Grade Level, Flesch Reading Ease, and Gunning Fox Index, generated using an online readability tool.

Results : Responses were appropriate in 85.7% (12/14), 100% (10/10), and 62.5% (10/16) of the general knowledge, screening, and treatment questions, respectively. Responses were incomplete for 31.3% and inappropriate for 6.2% of the treatment questions. Overall, ChatGPT-3.5 generated appropriate responses 80.0% (32/40), incomplete responses 17.5% (7/40), and inappropriate responses 2.5% (1/40) of the time. The average Flesch-Kincaid Grade Level, Flesch Reading Ease, and Gunning Fox Index were 12.0 ± 1.8, 39.9 ± 7.0, and 14.4 ± 2.3 for general knowledge responses, 12.7 ± 1.0, 31.3 ± 6.2, and 15.2 ± 1.0 for screening responses, and 13.3 ± 1.1, 27.2 ± 7.6, and 15.3 ± 1.3 for treatment responses.

Conclusions : Overall, ChatGPT-3.5 responses were consistently appropriate about AMD general knowledge and screening, but underperformed about treatment options. When this study was conducted, ChatGPT-3.5 was only trained on information available through 2022. Thus, important updates and advances in treatments were excluded (i.e. biosimilars and dry AMD treatments). Readability of the responses were at the level of a high-school graduate or higher, while health-related information is recommended to be at a fifth to sixth grade reading level. Understanding the current limitations of AI guided responses in the context of health-related counseling and information sourcing is critical.

This abstract was presented at the 2024 ARVO Annual Meeting, held in Seattle, WA, May 5-9, 2024.

×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×