Abstract
Purpose :
To determine the feasibility of mining free text within electronic medical records using advanced text mining techniques in order to evaluate drug-disease associations in a pilot retina study. Text mining (TM) draws on natural language processing, machine learning, and statistics and allows for efficient processing of large amounts of heterogeneous, free text data.
Methods :
We analyzed all EMR data in a retrospective cohort of patients with diabetic retinopathy, age-related macular degeneration and healthy controls at the National Eye Institute. Specifically, we applied MedEx, an open-source TM tool, to extract all drug entities from physician-written notes and standardized them into its respective canonical drug name, dose, and route. Each drug was normalized to its generic RxNorm concept unique identifier. We filtered negative mentions of drugs using NegBio, a tool developed in our lab for processing radiology reports, that uses patterns from dependency graphs to flag negative mentions (ie “not taking metformin”). The final result is a list of present, positive mentions of drugs taken by each patient.
Results :
18968 patient notes were text-mined, in which 71 controls, 698 AMD and 360 diabetic retinopathy patients were identified. A total of 14851 drug mentions were obtained. Compared to human annotation on a subset of results, the precision and recall of our combined text-mining pipeline is 0.945 and 0.925, respectively. 80% of false negatives are attributed to clinician misspellings, while errors in negation detection and drug abbreviations led to most false positives. Before propensity score matching, patient groups were older and had a larger follow-up interval than controls. However, there was no difference between the number of medications taken per patient between all groups (p>0.001). Several drugs that have previously been found in literature to suggest a reduction in the rates of progression of diabetic retinopathy and/or AMD were analyzed with Z-scores calculated (Tables 1 and 2).
Conclusions :
The high performance of our TM pipeline allows for mining large amounts of narrative text at an efficient rate, thereby alleviating the burden of human chart review. Although preliminary data showed no clinically significant drug correlations, we are expanding the scale of our dataset to include our hospital-wide EMR system and incorporating thousands of more patient records.
This is an abstract that was submitted for the 2018 ARVO Annual Meeting, held in Honolulu, Hawaii, April 29 - May 3, 2018.