Abstract
Purpose :
In recent years, there has been unprecedented growth in developing deep learning models for eye disease detection. However, most of the current research did not visualize the reasoning process, resulting in challenges to understanding how the models make decisions. While post-hoc explainable methods can provide some insights, they cannot explain the reasoning process of how the model actually makes its prediction. In this study, we aim to develop an interpretable deep learning model that compares the fundus images to prototypes learned in a human-understandable way for the detection of referable/non-referable diabetic retinopathy (DR).
Methods :
The training dataset was retrospectively drawn from the Chinese University of Hong Kong-Sight Threatening Diabetic Retinopathy study, which contains 6167 paired (macula-centered and optic nerve head-centered) fundus images of one eye. The proposed model first learned a set of prototypes that can represent the characteristics of each class. To fully make use of the paired fundus images, we utilized a shared weights VGG19 model as the feature extractor to extract representative features. The proposed model then calculated the similarity scores between learned features and learned prototypes. A fusion technique was applied to fuse these scores. Lastly, the model outputted the final prediction based on fused scores.
Results :
Our proposed model achieved comparable performance when compared to conventional black-box models with the area under the receiver operating characteristic curve (AUROC), sensitivity, specificity, and accuracy of 0.927 (95% CI 0.904-0.947), 82.5% (95CI 78.0-86.9), 89.8% (95% CI 86.5-93.0), and 86.4% (95% CI 83.7-89.1), respectively. In addition, our model showed good generalizability in four external datasets, with AUROCs of 0.932-0.985, sensitivities of 85.6%-93.1%, specificities of 86.7-97.7, and accuracies of 87.3%-97.5%.
Conclusions :
The proposed interpretable model not only performs well in the detection of referable DR in both internal and external datasets but also enhances the interpretability of the reasoning process. Our model shows the potential to gain confidence from healthcare stakeholders (e.g. doctors) in real-world clinical implementation.
This abstract was presented at the 2024 ARVO Annual Meeting, held in Seattle, WA, May 5-9, 2024.