Abstract
Purpose :
Just 12% of the American population is considered to have proficient health literacy. Rare diseases, such as Retinitis Pigmentosa, are a problem for patients who are far from medical centers, who lack access to care and resources. RP affects 1 in 3000-4000 people. Recently-released Large Language Models (LLMs) ChatGPT (OpenAI) and Bard (Google), as well as AI Assistants such as Amazon Alexa and Google Assistant are used by the lay public for health education. We seek to understand the ability of LLMs and AI Assistants for health education for rare diseases, such as RP. Additionally, we compare ChatGPT3.5 and its new update, ChatGPT4.0.
Methods :
5 questions relating to RP were developed:
Questions:
What is a retina?
What is retinitis pigmentosa (RP)?
What is the current treatment for RP?
How do I diagnose RP?
If my mom has RP - how likely is it that I will get it?
Questions were verbally asked to Amazon Alexa and Google Assistant, and audible responses were transcribed to text. Questions were also asked to ChatGPT3.5, ChatGPT4.0, and Bard. Several metrics of readability were used to assess the quality of the answer: Flesch-Kincaid reading ease score, Flesch-Kincaid grade level, SMOG index, Gunning-Fog score, Dale-Chall score and grade level.
Results :
LLMs had more words than AI Assistants (Mann-Whitney U, p=0.00188), but all readability metrics had no statistically significant difference between LLMs and AI Assistants. ChatGPT4.0 responses had fewer words than ChatGPT3.5 (Mann-Whitney U, p=0.0366). Bard's responses, the most readable, scored as high-school level on every response, while ChatGPT and the AI assistants had at least one college-level grade in the responses.
Conclusions :
Both LLMs and AI Assistants are capable of producing excellent, informative responses that can be used to educate patients. The wide adoption of these technologies makes them accessible to everyone, however, verbose and complex sentences may impede their use for medical education for the general population. ChatGPT4.0 responses use fewer words than ChatGPT3.5, indicating that with more data and human interaction, LLMs will become better tools for patient education. Such rapid progress makes AI an exciting new field for physicians to explore.
This abstract was presented at the 2024 ARVO Annual Meeting, held in Seattle, WA, May 5-9, 2024.