Abstract
Purpose :
3 million Americans have glaucoma, but only half are aware of their disease. Online tools, such as WebMD, may have led people to seek care after learning that their symptoms may be abnormal. ChatGPT (Microsoft/OpenAI) is an AI Large Language Model (LLM) that generates heavy internet traffic. Some patients may use ChatGPT as an online resource to check their symptoms or learn more about their possible disease. We attempt to semantically assess LLM outputs and compare them to the AAO's resources for glaucoma patients to determine if AI models are accurate enough to be used as education.
Methods :
Headings from the AAO's page on glaucoma were created into questons. OpenAi's gpt-3.5-Turbo and gpt-4.0 APIs were queried for a total of 15 questions:
What is the main cause of glaucoma?
What is open-angle glaucoma (OAG)?
What is Angle-closure glaucoma (ACG)?
What are symptoms of OAG?
What are symptoms of ACG?
Do glaucoma suspects have symptoms?
What are symptoms of pigment dispersion syndrome?
Who is at risk for glaucoma?
How is glaucoma diagnosed?
Can glaucoma be stopped?
What medications are for glaucoma?
What is a trabeculoplasty?
What is an iridotomy?
What are glaucoma drainage devices?
What is a patient's role in glaucoma treatment?
Responses were compared with a semantic text comparison model from HuggingFace ("MiniLM-L6-v2"), and the soft cosine distance between the LLM responses and the AAO's answers was taken. Dale-Chall grade levels were used to assess the readability of each answer.
Results :
The soft cosine distance between LLMs and the AAO's answers varied between 0.58 and 0.95. There was no significant difference between GPT3.5 and 4.0 overall, however, GPT4.0 outperformed GPT3.5 on every question relating to symptoms. The AAO's website had a Dale-Chall score of 8.85, correlating to grades 11-12, while ChatGPT 3.5 and 4.0 scored 10.37 and 10.59, respectively (college graduate).
Conclusions :
LLMs can produce reasonably similar responses to existing resources developed by health professionals, reflected by high soft cosine scores (0=fully opposite, 1=identical text). Thus, LLMs present an exciting opportunity for doctors to educate their patients, perhaps by training a custom GPT LLM on each of the major areas they treat. However, the high reading grade levels present an issue when helping patients who may not comprehend English well.
This abstract was presented at the 2024 ARVO Annual Meeting, held in Seattle, WA, May 5-9, 2024.