Abstract
Purpose :
With the emergence of advanced Large Language Models, such as ChatGPT, there is a growing trend in leveraging artificial intelligence for improved clinical support. This study aims to evaluate ChatGPT-4’s diagnostic performance in analyzing retinal pathologies.
Methods :
Twenty-four clinical cases covering diverse retinal conditions were sourced from a publicly available online database and input into ChatGPT-4. Each case scenario’s text was presented to the model three times, soliciting suggestions for both differential and final diagnoses. This process was iterated with the inclusion of clinical images alongside the text. Model accuracy was evaluated by comparing diagnoses and differentials with those provided by four ophthalmologists – two relying solely on textual information and two using both text and images. Statistical comparisons were performed using chi-square and Fisher’s exact tests.
Results :
The diagnostic accuracy of ChatGPT was found to be 54.1% (13 correct answers) in text-only scenarios, which improved to 58.3% (14 correct answers) when clinical images were included. Ophthalmologists demonstrated higher accuracy levels, ranging from 79.1% to 83.3%. In the top three differential diagnoses, ChatGPT reached 66.6% accuracy (16 correct answers) with text-only input and 70.8% accuracy (17 correct answers) when relevant images were included. Specialists identified differential diagnoses with accuracy rates between 91.6% and 95.8%. In the final diagnosis, ChatGPT’s performance was comparable to that of one of the specialists with text-only input (p=0.06) and two other specialists when images were included (p=0.05 and 0.11). While ChatGPT generally demonstrated better results in differential diagnosis, its overall performance was notably lower than all specialists (p=0.02, p=0.03).
Conclusions :
ChatGPT demonstrated improved diagnostic performance in the analysis of retinal disease when clinical images were included. While its performance was lower than that of the specialists, the discrepancy narrowed in the final diagnoses. These results underscore the potential of ChatGPT as a supplementary tool in both medical education and practice, augmenting the clinical decision-making process.
This abstract was presented at the 2024 ARVO Annual Meeting, held in Seattle, WA, May 5-9, 2024.