Abstract
Purpose :
Patients frequently turn to online search engines for answers to medical questions and can be expected to increasingly ask these questions to artificial intelligence (AI) chatbots. We investigated the utility of 3 novel AI chatbots (ChatGPT, Google Bard, and Bing Chat) to provide reliable, accurate, and consistent information for patients.
Methods :
The 3 variations of Bing Chat (Balanced, Creative, and Concise) were treated as three separate chatbots. We compared the ability of these programs to answer personalized patient questions such as “How do I fix my droopy eyelids?” and create unique educational resources through inputs such as “Create me a question and answer sheet regarding eyelid ptosis repair.” The quality of chatbot responses were rated based on their medical accuracy, comprehensiveness, and coherency on a scale (1-4) by 4 practicing ophthalmologists (blinded to each chatbot graded) (Table 1).
Results :
Chatbots received average ratings ranging from 1.3-2.4 / 4 when asked to answer patient questions. Bing Chat (Creative) had the highest quality and accuracy of responses to patient questions with a score of 2.4 (Table 1). Bing Chat (Concise) scored the lowest with a score of 1.3. The chatbots received average ratings ranging from 1.8 - 3.9 / 4 when asked to create patient educational resources. Bing Chat (Creative) scored the highest (average rating: 3.9) followed by Chat GPT (average rating: 3). Bing Chat (Balanced) performed the poorest at creating patient resources (average rating: 1.8/4) Google Bard maintained a consistent quality of response to both prompts (average 2.3/4) while other chat bots performed better with one prompt than the other.
Conclusions :
According to raters, chatbots, in particular, Bing Chat (Concise) and Bing Chat (Balanced) consistently lacked content and their outputs were too simple and short. Further, Google Bard and Bing Chat (Balanced) often contained either missing or incorrect information, leading to lower scores for those outputs. Given the limited accuracy demonstrated by these chatbots, we warn against reliance on AI chatbots when seeking health-related information until improvements in algorithms are achieved and validated in the future.
This abstract was presented at the 2024 ARVO Annual Meeting, held in Seattle, WA, May 5-9, 2024.