Abstract
Purpose :
To evaluate the capabilities of a large language model (LLM) based chatbot for diagnosing glaucoma using the Ocular Hypertension Treatment Study (OHTS) dataset.
Methods :
A total of 3170 eyes of 1585 subjects from the OHTS were included in this study. We selected demographic, clinical, ocular, visual field, optic nerve head photo, and history of disease parameters of each subject. We automated the process of case report generation by converting tabular data into textual format based on information from both eyes of all subjects using application program interface (API) of ChatGPT (Fig 1). We randomly selected subsets of patients and tested different final questions to engineer a prompt with the highest accuracy (Fig 2A). We then tested different combinations of parameters to assess the diagnostic performance and selected the best performing subset for the downstream analysis. We subsequently developed a procedure using API of ChatGPT, to automatically input prompts into the chat box followed by querying ChatGPT (3.5 and 4.0) regarding the underlying diagnosis of each subject based on the onset and last visits.
Results :
ChatGPT3.5 achieved AUC of 0.74, accuracy of 66%, specificity of 64%, sensitivity of 85%, and F1 score of 0.72. ChatGPT4.0 obtained AUC of 0.76, accuracy of 87%, specificity of 90%, sensitivity of 61%, and F1 score of 0.92 based on the last visit (Fig 2B).
Conclusions :
The accuracy of ChatGPT4.0 in diagnosing glaucoma based on OHTS was promising. The overall accuracy of ChatGPT4.0 was higher than ChatGPT3.5. However, ChatGPT3.5 was found to be more sensitive than ChatGPT4.0. Currently, ChatGPT may serve as a useful tool in exploring disease status of ocular hypertensive eyes with clinical parameters. In the future, leveraging LLM with multi-modal capabilities for integration of demographic, clinical, and imaging data, may further enhance diagnostic capabilities.
This abstract was presented at the 2024 ARVO Annual Meeting, held in Seattle, WA, May 5-9, 2024.