Abstract
Purpose :
Diabetic Retinopathy (DR) is the leading cause of new cases of blindness in the US with early symptoms going completely unnoticed in some patients. Currently, there is no evaluation study exploring the most optimal strategy for training an Artificial Intelligence (AI) based DR classification model. To remedy this, in this study we compare existing Supervised Learning (SL) approaches with various Self-Supervised Learning (SSL) techniques, for different architecture backbones, pretraining strategies as well as generative vs contrastive SSL methodologies.
Methods :
We started with a dataset of 35,126 fundus images acquired on different CFP imaging platforms with varying image qualities. For each patient, both right and left eye images were labelled 0 to 4, from “No DR” to “Proliferative DR”. We explored both SL and SSL techniques for two different overarching architectures, Convolutional Neural Networks (CNN) and Vision Transformers (ViT). SSL training consisted of a ResNet50 backbone architecture and self-DIstillation with NO labels (DINOv1) for contrastive SSL and a Vision Transformer Masked Auto-Encoder (ViTMAE) for generative SSL. Furthermore, SSL and SL models pre-trained on ImageNet (IN) were explored with the SSL fine-tuned a second time with domain-specific (DS) data. Each was then trained for downstream classification.
Results :
Our four comparisons: A) Between ViT and ResNet50 SL only, we find statistically insignificant differences between the from-scratch versions but statistically significant results with IN fine-tuned ViT outperforming IN fine-tuned ResNet50. B) Between performing SSL with ImageNet only and IN plus DS data, we find statistically significant better performance of additional DS SSL for DINOv1, but statistically insignificant differences for ViTMAE. C) Between ViTMAE and DINOv1 SSL, we find statistically insignificant differences between their best versions, SSL with IN and DS data for DINOv1 and IN only for ViTMAE. D) Between SSL (from C) and SL, we find a very strong preference for SSL for either backbone.
Conclusions :
Our results show the efficacy of fine-tuning on a large dataset for both SSL and SL overall. It also shows that SSL pipelines yield better results than SL alone, though the addition of domain-specific data needs to be explored more. These results suggest that SSL can help overcome some of the shortcomings with traditional SL approaches in the medical domain.
This abstract was presented at the 2024 ARVO Annual Meeting, held in Seattle, WA, May 5-9, 2024.