Abstract
Purpose :
Accuracy of medication data in electronic health records (EHRs) is essential for patient care and research. Previous work has shown frequent errors in medication lists include incomplete records, duplicated prescriptions, and failed discontinuation of medications. Since medication lists are inaccurate, physicians often record medication information in progress notes, which is difficult to automatically extract since notes are written as free-text narratives. The purpose of this study is to develop a natural language processing (NLP) model for automatically extracting medication information from free-text notes for glaucoma patients. Medication is a crucial part of glaucoma treatment and is important for glaucoma research such as predicting disease progression.
Methods :
We used an NLP technique called Named Entity Recognition (NER) to extract medication information from clinical notes. First, we sampled a dataset of 296 progress notes from office visits at OHSU in 2019 with ICD10 codes associated with glaucoma. Next, we manually annotated text in each note for six entities related to medication. Figure 1 displays an example of the annotation. Next, we developed and evaluated an NER model with the Python spaCy package, using the annotated dataset randomly split into 75% for training and 25% for testing. Finally, we evaluated the results of the NER model’s extraction for the test set comparing the manually annotated and the NER model’s extracted entities using F1 score, precision, and recall.
Results :
Table 1 shows the overall and per-entity performance for the NER model on test data. The NER model had an overall F1 score = 0.949, precision = 0.944, and recall = 0.953. The F1 scores for the entities ranged from 0.97 for the “Route” and 0.91 for the “Dosage”. An error analysis was performed for false negative and positive on all entities. Several causes of errors were identified, including differences in note formatting, ambiguous annotation, and misclassification when medication information was contained in multiple short sentences.
Conclusions :
This study shows that NLP can be used to accurately extract glaucoma medication information from free-text EHR data; the performance of our model is similar to the best performing published NLP models for medication extraction studies. This has implications in improving the data quality and usefulness for medication data in glaucoma research.
This is a 2021 ARVO Annual Meeting abstract.