Abstract
Purpose :
The purpose of this study was to develop deep learning models to recognize ophthalmic examination components from free text clinical progress notes in electronic health records (EHR), while using a weak supervision approach to amass a large training corpus.
Methods :
A corpus of 39,099 ophthalmology progress notes labeled for 24 anterior and posterior segment anatomical components (named entities) of the ophthalmic examination was assembled from the EHR of a single academic center using a weakly supervised approach that automatically matches labeled EHR fields with corresponding words in the notes. The corpus was split into training, validation, and test sets. Four massively pre-trained transformer-based language models (DistilBert, BioBert, BlueBert, and ClinicalBert) were fine-tuned to this named entity recognition task. Results were compared to a baseline model based on regular expressions. Precision, recall, F1-score were reported for each entity and micro-averaged across the test set. The same metrics were also reported on a sample of human-annotated ground truth notes from the test set, and a sample of human-annotated notes from an independent set of notes.
Results :
On the weakly labeled test set, all transformer-based models had micro-averaged recall over 0.92, with precision varying from 0.44-0.85. The baseline model had lower recall (0.77) and comparable precision (0.68). On human-annotated notes from the test set, the baseline model had high recall (0.96) with precision variable depending on the entity (0.11-1.0, micro-averaged 0.57). Bert models had better performance, with recall ranging from 0.77-0.84, and micro-averaged precision >=0.95 for all models. On the independent notes, precision was 0.93 and recall 0.39 for the Bert model, whereas the baseline model had better recall (0.72) but poor precision (0.44).
Conclusions :
We have developed the first deep learning system to recognize eye examination components from clinical progress notes, leveraging a novel opportunity for weak supervision to produce a large training corpus from EHR. Transformer-based models had very high precision when evaluated against human-annotated ground truth labels, whereas the baseline model had poor precision but higher recall. This system hold potential to improve ophthalmology cohort design and feature identification using free-text clinical progress notes.
This abstract was presented at the 2022 ARVO Annual Meeting, held in Denver, CO, May 1-4, 2022, and virtually.