Abstract
Purpose :
AI-assisted interpretation of ultra-widefield fundus (UWF) photography can enhance pediatric retinal disease screening, leading to earlier disease detection. RETFound is a large Vision Transformer (ViT-L) model pre-trained on 1.6 million color fundus photographs and OCT B-scans, demonstrating strong performance in many tasks (i.e. grading diabetic retinopathy, detecting ischemic stroke and heart failure). An open question is the generalizability of RETFound to classification tasks based on UWF images that were not part of its pre-training dataset. We hypothesize that RETFound will outperform ImageNet-pretrained (IN-p) ViT-L and ResNet-50 models to identify pediatric retinal pathology from a given set of UWF images due to its retina-specific pretraining.
Methods :
304 UWF images were retrospectively collected from 62 healthy pediatric patients undergoing routine eye examination at DukeHealth from 2014-2020; class labels of normal and abnormal were assigned by 2 independent vitreoretinal fellows. 3 machine learning models (RETFound, IN-p ViT-L, and IN-p ResNet) were fine-tuned on the binary image classification task of identifying normal vs. abnormal fundus images using validation set performance. The model with the highest validation accuracy from each original model was assessed on the held-out test set. Performance metrics included accuracy, area under the receiver operating characteristic curve (AUROC), average precision, F1 score, precision, and recall.
Results :
UWF images were randomly divided into training, validation, and test datasets. 82 images (27%) were labeled as normal and 222 images (73%) as abnormal. For ResNet, ViT-L, and RETFound models respectively, we report the following: classification accuracy (66.7%, 64.1%, 81.3%), AUROC (77.8%, 76.4%, 81.5%), F1 scores (61.0%, 61.1%, 77.7%), average precision values (73.7%, 67.1%, and 77.6%), precision scores (61.6%, 60.7%, 75.8%), and recall scores (66.6%, 64.1%, 81.3%).
Conclusions :
RETFound outperformed ResNet-50 and ViT-L models across all metrics. Shared architecture between RETFound and ViT-L suggests that retina-specific pretraining, without UWF images, improves classification accuracy compared to conventional IN-p. This study underscores RETFound’s potential as an optimal basis for AI analysis of UWF images.
This abstract was presented at the 2024 ARVO Annual Meeting, held in Seattle, WA, May 5-9, 2024.