Abstract
Purpose :
Image-based medical artificial intelligence (AI) algorithms can be overly sensitive to changes in image features, which can negatively affect diagnostic performance and repeatability (test/retest with different images from the same patient). Model ensembling via Monte Carlo dropout (MCD) has been proposed as a solution to reduce model variance, thereby increasing repeatability and possibly classification performance.1 We evaluate implementation of MCD for detection of plus disease in retinopathy of prematurity (ROP).
Methods :
A dataset of color retinal fundus images, collected by the Imaging and Informatics in ROP Consortium (28426 images, 965 babies), was stratified by patient into training, validation, and test datasets (60%/15%/25%) and used to generate a ResNet-18 deep learning (DL) model for classification of normal, pre-plus disease, and plus disease. Spatial dropout layers (P=0.2) were added after every residual block, which were unconventionally activated during inference — slightly altering model structure and predictions with each forward pass (n=2). Softmax outputs of each pass were averaged and converted into a continuous 1–9 vascular severity score (VSS), which was used to measure area under the precision-recall curve (AUPR, normal/pre-plus versus plus). Two random images from each eye were used to measure repeatability via Bland-Altman limits of agreement (LoA) and the classification disagreement rate. Significance (p < 0.05) was assessed via t-tests.
Results :
MCD significantly improved AUPR, LoA, and classification disagreement rates by 12.5%, 20.4%, and 21.7%, respectively. AUPR [95% confidence interval (CI)] was increased to 0.800 [0.799, 0.801] from 0.711 [0.702, 0.719] (p < 0.001), LoA was decreased to 2.63 [2.61, 2.66] from 3.31 [3.27, 3.34] (p < 0.001), and disagreement rates decreased to 22.6% [22.0%, 23.1%] from 28.8% [28.2%, 29.5%] (p < 0.001).
Conclusions :
Without increasing model complexity or training time, MCD provided a significant increase in performance and repeatability. Implementation of this technique is simple, yet effective, and a strong argument could be made to use it over non-MCD DL models. This has important implications for medical AI algorithms applied to image-based diseases, such as ROP, where imprecision in repeatability could lead to diagnostic and therapeutic errors with the potential for life-altering consequences.
This abstract was presented at the 2023 ARVO Annual Meeting, held in New Orleans, LA, April 23-27, 2023.