Abstract
Purpose :
Early Hypertensive retinopathy (HR) detection is of great interest to avoid irreversible damage to the retinal microcirculation. Computer-aided systems are a promising solution for cost-effective disease screening using fundus images. A paradigm shift is currently emerging in AI for medical imaging with large-scale pre-trained foundation models (FMs), and their strong transferability capabilities. This study aims to elucidate if emerging FMs can effectively be adapted to the challenging task of HR detection.
Methods :
We perform a efficient adaptation of FMs to HR detection by Linear Probing, i.e. adjusting a linear classifier over pre-trained features. Two data regimes are examined: a few-shot using 10 samples per class, and a large data regime using all training samples. Three FMs are considered. First, RETFound (Zhang et al., Nature 2023), a self-supervised autoencoder pre-trained on 800K fundus images, with a ViT-B/16 backbone. Second, FLAIR (Silva-Rodriguez et al., ArXiv 2023), a vision-language model encoding domain-specialized knowledge via text supervision, trained on 300K fundus images and 96 different retinal conditions using a RN50 (25M parameters). Finally, BioMedCLIP (Zhou et al., ArXiv 2023), a generalist ViT-B/16 transformer with 86M parameters pre-trained on 15M images from different domains (radiology, ophthalmology, etc.).
Results :
We used the CGI-HR dataset following a 5-fold cross-evaluation. Our results show that generalist models such as BiomedCLIP fail to provide a meaningful classification (0.620 AUC). Although domain-specific data alleviates the performance drop using RETFound (0.679 AUC), still the performance is considerably lower than using FLAIR (0.786 AUC), which leverages expert’s knowledge in text supervision. Similar trends are observed in the low data regime, where FLAIR shows 5% improvements compared to other FMs.
Conclusions :
This work studies the capability of recently released FMs for efficient HR detection. A careful selection of the FM used for adaptation is necessary. Larger model parameters or amounts of pre-training data are not necessarily preferable over domain-specific, expert-knowledge guided models. From a practical point of view, we must consider the potential for clinical application of this technique by doing so non-invasively, rapidly, safely and cost-effectively.
This abstract was presented at the 2024 ARVO Annual Meeting, held in Seattle, WA, May 5-9, 2024.