Abstract
Purpose :
Narratives used by low vision patients to describe their goals or complaints are often qualitative in nature and difficult to organize because of the variety and diversity in language usage. As such, it is arduous to link patients’ descriptions of their ability to perform a task with their vision measurements. In this study, we used a natural language processing (NLP) framework to extract the patterns of patients’ descriptions of complaints, and analyze their correlations with vision measurements using machine learning methods.
Methods :
Deidentified electronic medical records of 616 low vision patients with a primary diagnosis of macular dysfunction were analyzed. Vision measurements included best-corrected visual acuities, contrast sensitivity and status of the central visual field. These measurements were used to cluster patients into different categories using K-means clustering. Multiple keywords, primarily related to daily activities (e.g. cook, drive, read), were identified based on patients’ narratives during history-taking. For each of these keywords, patients’ narratives related to the keyword were analyzed using NLP that included lemmatization, stemming, synonym merging and PCA, resulting in a word vector. A Naïve Bayes classifier was then used to classify word vector into the vision-measurement-based categories given by K-means. 80% of the word vector was used for training and 20% for testing. Testing accuracy and F1 scores were measured to evaluate the correlation of patients’ narratives in relation to the keyword to a specific vision-measurement-based categorization.
Results :
Patients’ narratives related to specific keywords correlated differently with various vision measurements. For example, narratives related to the keyword cook were classified with test accuracies of 80.4% (F1=0.81) and 60.9% (F1=0.62) into categories generated by vision measurements of visual acuity and contrast sensitivity respectively. For the keyword walk, accuracies using these two vision measurements were 57.5% (F1=0.55) and 70% (F1=0.66), respectively.
Conclusions :
We demonstrated the feasibility of using a framework that combined NLP and machine learning methods to analyze patients’ narratives extracted from electronic medical records. This framework might pave the way to better understand the correlations between the qualitative complaints of patients and their quantitative vision measurements.
This abstract was presented at the 2024 ARVO Annual Meeting, held in Seattle, WA, May 5-9, 2024.