Abstract
Purpose :
Unstructured free-text notes within electronic health records (EHR) store vast amounts of difficult to extract patient information, including vital components of the eye examination, such as visual acuity (VA). Our purpose was to develop and evaluate the first-ever deep learning models to identify VA measurements and their lateralities from free-text ophthalmology notes.
Methods :
333,958 clinical notes with documented VA measurements were identified from Stanford EHR. Notes were split 80:10:10 into train, validation, and test sets. Notes were labeled with VA with/without correction and with/without pinhole for both eyes using a weakly-supervised approach leveraging EHR exam input fields. Bidirectional Encoder Representations from Transformers (BERT) models were fine-tuned to identify VA, and included BERT models pre-trained on biomedical literature (BioBERT), critical care EHR notes (ClinicalBERT), both (BlueBERT), and a lighter version of BERT with 40% fewer parameters (DistilBERT). Model performance for each entity was evaluated on a held-out test set using micro-averaged precision, recall, and F1 score and compared to a baseline rule-based algorithm. Models were also evaluated on a human-annotated subset of the test set. Qualitative review of model predictions was performed by a board-certified ophthalmologist.
Results :
On the test set, the baseline model identified VA with a F1 score of 0.76, a precision of 0.85, and recall of 0.69, micro-averaged across all VA types. BERT models performed with a micro-averaged F1 score ranging from 0.75 (ClinicalBert) to 0.90 (BioBert). Micro-averaged precision ranged from 0.64 (ClinicalBert) to 0.89 (BioBert) and micro-averaged recall ranged from 0.90 (ClinicalBert, BlueBert) to 0.91 (DistilBert, BioBert). Model performance improved on the human-annotated subset of the test set, with BlueBERT being the best (F1 0.92), and the baseline model last (F1 0.83). Common errors included labeling VA in sections outside of the examination portion of the note, difficulties labeling current VA alongside a series of past VAs, and missing non-numerical VAs such as “cf” (count fingers), etc.
Conclusions :
Our study demonstrates that BERT models are capable of identifying VA from free-text ophthalmology notes with high precision and recall, with improvements over a rule-based algorithm.
This abstract was presented at the 2023 ARVO Annual Meeting, held in New Orleans, LA, April 23-27, 2023.