Abstract
Purpose :
After-visit summaries (AVS), can strengthen doctor-patient communication and improve health outcomes. AVS may not routinely be provided at every clinical visit, potentially due to the extra time required to create additional documentation. Recent legislation mandates patient access to clinical notes, however these are written at a more sophisticated reading level or may contain abbreviations or jargon, all of which limits patient understanding.
ChatGPT, an artificial intelligence (AI) model can appropriately answer questions concerning retinal disease. The purpose of this study is to determine whether ChatGPT-4, the most updated version of the model, can generate accurate, readable AVS for common retinal conditions.
Methods :
One to three conditions among the following disease categories were selected by an experienced, fellowship-trained, retina specialist with the intention of capturing a set of conditions which may be routinely encountered by an adult retina specialist: “vascular,” “macular,” “peripheral,” “inflammatory or infectious,” “neoplastic,” “toxic,” “surgical.”
Two clinical notes written between 2020 and 2023 were randomly obtained for each condition and were graded by three practicing ophthalmologists, with senior author adjudication, by the following criteria: Accurately describes 1. diagnosis, 2. clinical visit, and 3. follow up plan. Inaccurate responses were further categorized as “Incorrect,” “Omission,” or “Hallucination.” Reading level of the responses was assessed using the Flesch-Kincaid readability test.
Results :
38 AVS describing 19 retinal conditions written by 12 physicians were generated. The mean (standard deviation) word count of the notes was 242 (53) (range: 119 to 361). Descriptions of the diagnosis, clinical visit, and follow up were accurate for 30 (79%), 20 (53%), and 26 (68%) of the AVS. Incorrect information was most common (5 [13%], 12 [32%], and 7 [18%], respectively) whereas hallucination was noted in 6 (16%) notes. There was no difference in accuracy as a function of note word count or author. Flesch-Kincaid scores demonstrated patients would require between 3.4 and 12.5 years of education to understand the responses.
Conclusions :
AI could be used to create clinical summaries, which may improve doctor-patient communication and health outcomes. Further work is necessary to ensure output is readable and completely accurate.
This abstract was presented at the 2024 ARVO Annual Meeting, held in Seattle, WA, May 5-9, 2024.