Abstract
Purpose :
Deep learning (DL) methods for automated retinal OCT screening often provide overly confident predictions for unrelated pathologies (outliers), compromising their translation to clinical use. This study aims at improving the robustness/reliability of the DL-based diagnostic systems when presented with disease types not included as part of the training.
Methods :
Outlier exposure (OE), i.e. training a model with a small number of outlier cases, was explored to improve outlier detection during automated screening for age-related macular degeneration (AMD) on retinal OCTs. We used a multi-center dataset with target classes: non-pathological, intermediate AMD (iAMD), neovascular AMD (nAMD) and geographical atrophy (GA), and outliers: diabetic macular edema (DME), retinal vein occlusion (RVO), and Stargardt disease. We fine-tuned a DL model (EfficientNetV2-B0) for central B-scan classification. The tested approach is entropy normalization OE, i.e. approximating the outlier prediction probabilities to the uniform distribution. As a baseline, the network is trained without OE. Each sample’s outlier score was obtained with the following metrics: the maximum predicted classification probability (MP), the Entropy of the output probabilities, and the Cosine distance based on the features of the penultimate network layer (Fig. 1). Target classes consisted of 3364 OCTs (2661 patients) and were split patient-wise into 70% training, 15% validation, and 15% testing; 295 outlier samples were included in the test set (162 DME, 19 Stargardt and 114 RVO), and 500 OCTs were available for OE subset selection.
Results :
Providing a reduced number of outlier cases, increased the outlier detection performance without deteriorating the inlier classification performance: 0.98 macro-average area under the Receiver Operating Characteristic (AUC). Without OE, the AUC for the identification of outliers was MP: 0.69; Entropy: 0.71; Cosine distance: 0.85. With four outliers exposed per class, these AUCs increased to MP: 0.78; Entropy: 0.80; Cosine: 0.90. Thus, combining OE with the Cosine distance improved the outlier detection performance by 30% compared to the baseline with MP scoring.
Conclusions :
Exposing the network to a few non-AMD examples improved the detection of unrelated pathologies in the context of automated AMD screening, making the DL systems more reliable, trustworthy and fit for future use in a clinical setting.
This abstract was presented at the 2023 ARVO Annual Meeting, held in New Orleans, LA, April 23-27, 2023.