Abstract
Purpose :
This study evaluated the performance of a semi-automated algorithm to populate a research database with electronic health record (EHR) data, while reducing manual burden and flagging errors.
Methods :
We developed a semi-automated algorithm to extract data from structured fields of the University of Michigan EHR for participants enrolled in the Michigan Screening and Intervention for Glaucoma and Eye Health through Telemedicine (MI-SIGHT) Program. Data collected included 76 elements from the technician eye exam such as medical history, visual acuity and refraction, and 196 elements from the physicians’ screening results such as assessment of the external and fundus photographs. A randomly selected sample of participants were identified. Data was extracted from the EHR through 1) the semi-automated algorithm, 2) manual extraction, and 3) gold standard double data entry. The algorithm performance and manual data extraction were compared to double data entry and error rates were calculated. Algorithm non-completion rate, or proportion of data elements that still required manual data entry, was measured. Algorithm training continued iteratively until pattern errors could no longer be identified.
Results :
Fifty participants and their progress notes from the technician exam and physician screening results were randomly selected for evaluation from the 1288 participants enrolled. Manual data entry took approximately 12 hours, and the semi-automated algorithm took 2 hours for the 50 participants. For technician exams, the algorithm flagged 15.8% of the data (n=585/3700) for manual entry. Reasons for manual entry included typographic errors and non-conventional entries. Of the remaining entries that were entered automatically, the error rate was 0.5% (n=15) for manual extraction errors and 0.1% (n=3) for algorithm errors (p=0.005, McNemar’s test). For the physician screening results, 7.0% of data (n=686/9800) was flagged for manual entry. Of the remaining data, the error rate was 0.1% (n=9) for manual extraction errors and 0% for algorithm errors (p=0.003, McNemar’s test).
Conclusions :
The algorithm showed strong performance in automatic data extraction and substantially reduced manual data entry burden. This type of process could improve both rigor and reproducibility in studies with large samples and numerous data elements. Next steps involve testing the algorithm in a new sample.
This abstract was presented at the 2022 ARVO Annual Meeting, held in Denver, CO, May 1-4, 2022, and virtually.