Abstract
Purpose :
Conventional deep learning (DL) classification methods often exhibit over-confidence and lack robustness when faced with a shift in data distribution, leading to challenges in out-of-distribution (OOD) scenarios. This study introduces an approach that identifies OOD samples through uncertainty quantification, specifically tailored for reliable glaucoma prediction.
Methods :
We present an Out-of-Distribution (OOD) detection method designed to identify unreliable predictions in unseen samples while simultaneously classifying images as glaucoma or non-glaucoma. Our proposed Uncertainty Learning model, named the Dirichlet model, integrates a multi-layer perceptron onto a conventional CNN feature extractor. We compare the OOD detection performance of our Dirichlet model to a baseline CNN model, referred to as the softmax model. Training on 712 fundus images from the Illinois Eye and Ear Infirmary (355 glaucoma, 357 non-glaucoma), we evaluate OOD detection and glaucoma classification on public REFUGE and LAG fundus datasets and two non-medical public CIFAR-10 and Fashion-MNIST datasets.
Results :
The Dirichlet model consistently outperforms the softmax model in OOD detection, demonstrating improvement ranging from 2% to 19% across various datasets. Table 1 illustrates that the Dirichlet model achieves 64.4% and 60.0% AUC for detecting REFUGE and LAG fundus images as OOD, along with strong performance (95.3% and 98.0% AUC) in detecting CIFAR-10 and Fashion-MNIST non-fundus datasets as OOD. The Dirichlet model maintains competitive glaucoma classification performance compared to the softmax model, with AUC values [95% confidence intervals] of 91.2% [90.7%, 91.9%] and 86.7% [86.6%, 86.7%] on REFUGE and LAG datasets, respectively. The softmax model achieves AUC values of 89.9% [89.4%, 90.7%] and 86.1% [86.0%, 86.2%] on the corresponding datasets.
Conclusions :
The study demonstrates the effectiveness of our proposed uncertainty aware Dirichlet model in OOD detection and glaucoma classification tasks across diverse domains, extending its utility beyond the initial training dataset. Furthermore, the incorporation of uncertainty scores in our model alerts users to instances where the model lacks sufficient information for a confident decision.
This abstract was presented at the 2024 ARVO Annual Meeting, held in Seattle, WA, May 5-9, 2024.