Abstract
Purpose :
Manual analysis of fundus images in diabetic retinopathy screening programme is time consuming due to the large number of patients and limited resources. We propose an end-to-end deep learning framework for automatic Diabetic Retinopathy (DR) grading.
Methods :
To perform the experiments, we used the largest public retinal image dataset provided by EyePACS for the Diabetic Retinopathy Detection with 35,126 images meant for training and 53,576 for testing and included a real-world scenario with images affected by noise. We randomly selected 10% of the images and used 18,860 images for training, 2,096 images for validation and 32,017 for testing. Only images with enough quality for analysis underwent the DR grading stage. 35,729 images were discarded and, therefore, 52,973 images were used for subsequent analysis.
The approach done by our group is based on an attention mechanism which performs a separate attention of the dark and the bright structures of the retina. The framework includes an image quality assessment stage and additional deep learning-related techniques such as data augmentation, transfer learning and fine-tuning. The architecture Xception as feature extractor and the focal loss function to deal with data imbalance was used. The Kaggle DR detection dataset was used for method development and validation.
Results :
The Quadratic Weighted Kappa (QWK) achieved with the proposed method was 0.78 on the test set, which corresponds to 83.7% accuracy. However, when dealing with class unbalance, QWK is dominated by the most representative classes, which is the class of No DR in our dataset. This is the reason why the confusion matrix is also important to evaluate the results.
First, the class of No DR was detected with high accuracy: only 2.9% (706 out of 23,962) of the R0 images were over-diagnosed. More importantly, only 0.0005% (12 out of 23,962) of them were rated as severe or very severe. Conversely, the class Mild DR was easily misguided with the classes no DR and mild to moderate. Finally, poor detection accuracy for class R4 was obtained.
Conclusions :
Our results suggest that our framework could be a diagnostic aid for the early detection and grading of DR. Further validation and comparative studies are needed.
This abstract was presented at the 2022 ARVO Annual Meeting, held in Denver, CO, May 1-4, 2022, and virtually.