Abstract
Purpose :
To develop a natural language processing (NLP) algorithm to identify patients with active uveitic macular edema (UME) from electronic health records (EHR) data in the American Academy of Ophthalmology IRIS® Registry (Intelligent Research in Sight).
Methods :
In order to identify patients with active, non-infectious UME, fellowship-trained retina specialists defined a combination of ICD-10 codes for macular edema (ME) and non-infectious uveitis in structured data and a list of UME keywords for non-historical ME in association with non-infectious uveitis in unstructured data. A heuristic NLP algorithm was then developed to identify patients with an active UME diagnosis at a given encounter based on the unstructured data definition using a SpaCy PhraseMatcher. Using IRIS Registry data from January 1st, 2016 to August 16th, 2023, notes from 500 randomly selected patients with UME keywords in their clinical records were labeled to determine their UME status: active UME or no/unknown active UME. This labeled dataset was split 7:3 for algorithm development and validation; the final algorithm was evaluated on the validation set using accuracy, sensitivity and specificity. Finally, the proposed NLP algorithm was used to identify patients with active UME in the IRIS Registry. The number of UME patients identified by the proposed NLP algorithm was compared to the number identified based on the ICD-10 codes alone.
Results :
The algorithm achieved an accuracy, sensitivity and specificity of 0.83, 0.95 and 0.73, respectively, using the validation set. Out of 231,543 patients with UME keywords in their clinical records, 129,316 patients were confirmed with active UME at the encounter level by the proposed NLP algorithm. Alternatively, 40,277 patients were identified as having active UME diagnosis using the ICD-10 codes alone.
Conclusions :
The proposed heuristic NLP algorithm demonstrated satisfactory performance in identifying patients with active UME in the IRIS Registry. UME patients are difficult to identify in real-world clinical research settings using structured data alone. This algorithm identified three times more patients with active UME compared to only using ICD-10 codes, providing an enhanced solution to conducting real-world evidence studies in the UME patient population.
This abstract was presented at the 2024 ARVO Annual Meeting, held in Seattle, WA, May 5-9, 2024.