Abstract
Purpose :
To evaluate the effect of contrastive learning (CL)-based pre-training on the performance of diabetic retinopathy (DR) classification
Methods :
We have developed a CL-based method to produce models with better representations and initializations for the detection of DR in color fundus images. Our model takes in image data, uses two-step data augmentation (neural style transfer + random geometric augmentation) to create a pair of augmented images, and then uses contrastive loss to maximize similarity of the two images in encoded space. The contrastive learning framework contains a Resnet50 encoder (CNN, ReLU, maxpool, and dense layers) with a projection head that maps the representation. The CL training utilizes a high batch size of 2048 and is trained for 100 epochs. Once the model is trained, the encoder(Resnet50) is used as a pre-trained model for a DR classification task (non-referrable vs referable DR). We compare our model with a model pre-trained with Imagenet weights. The model is trained and validated on a Kaggle dataset (35,126 fundus data) with 10–fold cross validation (split as training and validation set) and tested independently on real-life data (2,500 fundus data) from the University of Illinois (UI) Retina Clinic. For the DR classification, a learning rate of 1e-5 with gradual decay, batch size of 64, dropout rate of 0.4, and ADAM optimizer are used as hyperparameters
Results :
The CL trained model performed significantly better compared to the Imagenet trained model (AUC of 0.94(CI 0.90-0.99, p<0.001) vs 0.82(0.75-0.85, p<0.001) on Kaggle data and 0.91(0.88-0.96, p<0.001) vs 0.8(0.74-0.85, p<0.001) on UI data). We reduced the training data size to 10% of the total data. At 10% training data, the AUC was 0.84(0.8-0.87, p<0.05) for CL model vs 0.69(0.59-0.75, p<0.05) on Imagenet model (Kaggle data) and 0.81(0.79-0.86, p<0.05) vs 0.65(0.6-0.75, p<0.05) for UI data. The results reflect that the model generalized well(transferable from Kaggle to UI data) and CL based training allowed to utilize a small number of annotated data(10%) while still generating good diagnostic accuracy
Conclusions :
CL based pre-training with neural style transfer significantly improves DL classification performance and allows training with small, annotated datasets,therefore reducing ground truth annotation burden of the clinicians
This abstract was presented at the 2022 ARVO Annual Meeting, held in Denver, CO, May 1-4, 2022, and virtually.