Abstract
Purpose :
To evaluate the efficacy of deep learning (DL) methods for screening diabetic patients for neuropathy (NP) from fundus images collected during normal diabetic retinopathy (DR) screening. We examine the impact of the presence or absence of DR on the predictive power for NP.
Methods :
We used 23,784 eye images of 1,564 patients from a population-based study in South India. DR was classified with the modified Klein classification (Modified Early Treatment DR Study scales) by two independent observers in a masked fashion with high agreement (k = 0.83). NP was considered present if vibration perception threshold (VPT) value was >20V. VPT was measured by a single observer placing a biothesiometer probe perpendicular to the distal plantar surface of the great toe of both legs. 189 patients were confirmed with DR and 276 with NP, with 30 patients having both DR and NP. Images came from DR screening of either fundus but could include glare or external eye images. We separated 10% of patients approximately stratified across diagnoses as a hold-out test set. Convolutional DL models were trained with stratified 5-fold cross validation for hyperparameter selection. Reported performance is the area under the curve (AUC) of a true/false positive rate receiver operating characteristic (ROC) curve of each of the 5 models on the test set, with a sample standard deviation across the 5 models. The true/false positive rate (T/FPR) is measured across images rather than patients.
Results :
NP can be predicted from images in the entire test cohort with AUC of 0.710±0.003. A representative operating point on the ROC curve achieves 70%TPR@50%FPR. Examining the predictive power on the subset of the test cohort with DR, we found an AUC of 0.867±0.009. Reducing the training and validation sets to exclude patients with DR leads to an AUC of 0.684±0.014 on all patients, and a noisier performance of 0.873±0.050 on those with DR.
Conclusions :
We demonstrated that a DL model can be used to screen patients for NP based on typical images from DR screening. Predictive power increased in patients with DR, possibly because of the shared microvascular pathology. Removing patients with DR had little impact on overall performance, but a significant impact in predicting NP for patients with DR. While the accuracy of this method likely requires refinement for clinical usage, as an initial screening process it already shows promise.
This is a 2020 ARVO Annual Meeting abstract.