Abstract
Purpose :
Amblyopia is one of the most common causes of treatable vision loss in children. Accurately identifying patients with amblyopia in the electronic health record (EHR) is crucial for effective clinical care, enabling practice alerts, facilitating clinical trials, and supporting quality measures and research. However, amblyopia diagnosis billing codes are frequently omitted in patient records. To address this gap, we propose using Natural Language Processing (NLP) to analyze visit notes for a patient's amblyopia status and to compare the performance of a fine-tuned, pre-trained BioClinical BERT model and zero-shot prompt with Large Language Models (LLMs).
Methods :
Our dataset included visit notes and billing codes from new patient office visits between 2015 and 2022 in Pediatrics and Strabismus clinics at OHSU, focusing on patients aged 9 years or younger. We manually reviewed these notes to categorize the amblyopia status of each patient. Notes were labeled as “Amblyopia”, “Not Amblyopia”, and “Suspect Amblyopia.” Also, part of the patients with amblyopia were annotated with subtypes of amblyopia, including strabismic, deprivation, and refractive amblyopia. To evaluate the effectiveness of current billing codes, we randomly selected 2,000 patient notes and assessed the accuracy of the billing diagnoses. For the NLP approaches, we fine-tuned the BioClinical BERT model with the labeled clinical notes and explored the performance of the LLM Flan-T5 model with zero-shot. The dataset was split into training/validation/testing with 70%/15%/15%.
Results :
A total of 3,726 notes were randomly selected and manually annotated, identifying 2,089 amblyopia cases, 1,339 non-amblyopia cases, and 298 suspect amblyopia cases. Of these, 900 notes diagnosed with amblyopia were further annotated with three subtypes of the condition. The BioClinical BERT model achieved the highest results, with a macro average AUROC of 0.992 and an accuracy of 0.977 in determining amblyopia diagnoses (Table 1 and Figure 1). Additionally, the zero-shot Flan-T5 model demonstrated higher performance compared to using billing codes alone.
Conclusions :
Our findings clearly indicate that billing codes alone are inadequate for accurately identifying patients with amblyopia. In contrast, NLP approaches exhibit much higher accuracy and precision. Moreover, the Flan-T5 model reveals the potential for rapid phenotyping and enhanced interpretability.
This abstract was presented at the 2024 ARVO Annual Meeting, held in Seattle, WA, May 5-9, 2024.