Investigative Ophthalmology & Visual Science Cover Image for Volume 65, Issue 7
June 2024
Volume 65, Issue 7
Open Access
ARVO Annual Meeting Abstract  |   June 2024
Comparative Assessment of AI-Driven Chatbots for Ophthalmic Patient Information: Inaccuracies in Chatbots' Answers to Patient Questions
Author Affiliations & Notes
  • Michael Oca
    Ophthalmology, University of California San Diego Health System, San Diego, California, United States
  • Alomi Parikh
    Ophthalmology, University of Southern California, Los Angeles, California, United States
  • Jordan Conger
    Ophthalmology, University of Southern California, Los Angeles, California, United States
  • Allison McCoy
    Del Mar Plastic Surgery, California, United States
  • Leo Meller
    Ophthalmology, University of California San Diego Health System, San Diego, California, United States
  • Katherine Wilson
    Ophthalmology, University of Southern California, Los Angeles, California, United States
  • Sandy Zhang-Nunes
    Ophthalmology, University of Southern California, Los Angeles, California, United States
  • Footnotes
    Commercial Relationships   Michael Oca None; Alomi Parikh None; Jordan Conger None; Allison McCoy None; Leo Meller None; Katherine Wilson None; Sandy Zhang-Nunes None
  • Footnotes
    Support  None
Investigative Ophthalmology & Visual Science June 2024, Vol.65, 348. doi:
  • Views
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Michael Oca, Alomi Parikh, Jordan Conger, Allison McCoy, Leo Meller, Katherine Wilson, Sandy Zhang-Nunes; Comparative Assessment of AI-Driven Chatbots for Ophthalmic Patient Information: Inaccuracies in Chatbots' Answers to Patient Questions. Invest. Ophthalmol. Vis. Sci. 2024;65(7):348.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

Purpose : Patients frequently turn to online search engines for answers to medical questions and can be expected to increasingly ask these questions to artificial intelligence (AI) chatbots. We investigated the utility of 3 novel AI chatbots (ChatGPT, Google Bard, and Bing Chat) to provide reliable, accurate, and consistent information for patients.

Methods : The 3 variations of Bing Chat (Balanced, Creative, and Concise) were treated as three separate chatbots. We compared the ability of these programs to answer personalized patient questions such as “How do I fix my droopy eyelids?” and create unique educational resources through inputs such as “Create me a question and answer sheet regarding eyelid ptosis repair.” The quality of chatbot responses were rated based on their medical accuracy, comprehensiveness, and coherency on a scale (1-4) by 4 practicing ophthalmologists (blinded to each chatbot graded) (Table 1).

Results : Chatbots received average ratings ranging from 1.3-2.4 / 4 when asked to answer patient questions. Bing Chat (Creative) had the highest quality and accuracy of responses to patient questions with a score of 2.4 (Table 1). Bing Chat (Concise) scored the lowest with a score of 1.3. The chatbots received average ratings ranging from 1.8 - 3.9 / 4 when asked to create patient educational resources. Bing Chat (Creative) scored the highest (average rating: 3.9) followed by Chat GPT (average rating: 3). Bing Chat (Balanced) performed the poorest at creating patient resources (average rating: 1.8/4) Google Bard maintained a consistent quality of response to both prompts (average 2.3/4) while other chat bots performed better with one prompt than the other.

Conclusions : According to raters, chatbots, in particular, Bing Chat (Concise) and Bing Chat (Balanced) consistently lacked content and their outputs were too simple and short. Further, Google Bard and Bing Chat (Balanced) often contained either missing or incorrect information, leading to lower scores for those outputs. Given the limited accuracy demonstrated by these chatbots, we warn against reliance on AI chatbots when seeking health-related information until improvements in algorithms are achieved and validated in the future.

This abstract was presented at the 2024 ARVO Annual Meeting, held in Seattle, WA, May 5-9, 2024.

 

Table 1. Chatbot Quality Grading Scale

Table 1. Chatbot Quality Grading Scale

 

Figure 1. Physician Ratings of AI Responses to Patient Questions.

Figure 1. Physician Ratings of AI Responses to Patient Questions.

×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×