Abstract
Purpose :
RETFound, a foundation model trained on 1.6 million unlabeled retinal images using Self-Supervised Learning (SSL), was recently introduced to boost models trained with minimal labeled data. It showed promising results: diabetic retinopathy classifiers for Color Fundus Photographs (CFP) better generalize to unseen datasets when pre-trained with RETFound. More generally, in this study, we assess the out-of-domain generalizability of multi-disease detection models for CFP, when pretrained with RETFound.
Methods :
Four CFP datasets were considered: OPHDIAT (France, diabetic population, 77,827 images), OphtaMaine (France, general population, 17,120 images), RIADD (India, general population, 3,200 images) and ODIR (China, general population, 10,000 images). 7 disease categories were targeted: Diabetes, Glaucoma, Cataract, AMD, Hypertension, Myopia and Others. Cross-dataset evaluation was conducted: RETFound was fine-tuned for multi-disease detection on one dataset and evaluated on the others. RETFound was compared with two pre-trained models sharing the same architecture (ViT), but trained on ImageNet: one using Supervised Learning (SL-ImageNet), and the other using SSL (SSL-ImageNet). In addition, we compared SL-ImageNet with SL-bestArch-ImageNet, also pretrained through SL on ImageNet, but using the best possible architecture. A paired samples Wilcoxon test with Bonferroni correction was conducted to compare the per-class Area Under the receiver operating characteristic Curve (AUC) of each pretraining strategy.
Results :
On out-of-domain datasets, the median per-category AUC value was 0.8207, 0.7296, 0.7646 and 0.8399, when fine-tuning RETFound, SL-ImageNet, SSL-ImageNet and SL-bestArch-ImageNet, respectively; the best architecture for SL was efficientnet-b5-ns. RETFound achieved significantly higher performances when compared to SSL-ImageNet (p=0.0044) and to SL-ImageNet (p=7.4e-07). However, SL-bestArch-ImageNet significantly outperformed SL-ImageNet (p=4.8e-06).
Conclusions :
This study demonstrates the superiority of out-of-domain generalization performances of RETFound for multi-disease detection in CFP, in comparison to SL or SSL pretraining on ImageNet. It highlights that ViT is not the best architecture for this task, suggesting that improvement could be achieved by building a foundation model for different architectures.
This abstract was presented at the 2024 ARVO Annual Meeting, held in Seattle, WA, May 5-9, 2024.