Abstract
Purpose :
The prognosis and epidemiology of severe COVID-19 illness in patients with diabetic retinopathy (DR) are not well understood. Using electronic health record (EHR) data from the National COVID Cohort Collaborative (N3C) Data Enclave, we performed a retrospective cohort study and tested the hypothesis that machine learning (ML) can be applied to a multi-center national dataset to build a predictive model that identifies risk factors for hospitalization and mortality of COVID-19 patients with DR.
Methods :
We developed a random forest classifier model to identify patients at risk of hospitalization and mortality using EHR data from the N3C Data Enclave. The base population (n= 31,419) was defined as patients who have a DR diagnosis on or prior to their first positive COVID-19 lab result or diagnosis. Data were analyzed using computer programming languages including Python, PySpark, R, and SQL. The data were randomly split into 80% for the training set and remaining 20% for the test set. Random forest classifier models were built, and 100 features were identified to train the models, including demographics, medications, comorbidities, procedures, and lab measurements. Model performance was evaluated using area under the receiver operating characteristic curve (AUROC) on the test set. Feature importance was determined via Shapley values.
Results :
The random forest classifier model achieved AUROC of 0.7631 for predicting hospitalization and 0.8025 for predicting mortality. Important risk factors for hospitalization included patient age, comorbidities (kidney disease, heart disease, chronic lung disease), and medications. Important risk factors for mortality included lab measurements, patient age, and comorbidities. In addition, patients with DR and COVID-19 who present with a more advanced stage of DR and have other diabetic complications relative to those who have an early stage of DR and fewer diabetic complications were more likely to be hospitalized.
Conclusions :
Our results suggest that ML can be applied to a large dataset to predict clinical outcomes for DR and COVID-19. Our model reveals that age and lab measurements were the most important features in predicting COVID-related mortality, and the leading comorbidities of severe COVID illness in DR patients include kidney disease, heart disease, and chronic lung disease.
This abstract was presented at the 2023 ARVO Annual Meeting, held in New Orleans, LA, April 23-27, 2023.