Abstract
Purpose :
Current deep neural networks (DNN) require a large number of labeled examples for training. Recently, self-supervised learning (SSL) has become a promising technique to pretrain DNN with a vast amount of unlabeled imaging datasets producing general-purpose “foundation models”. However, the best techniques for pretraining DNN on retinal optical coherence tomography (OCT) data that would allow effective fine-tuning for both OCT classification and segmentation downstream tasks remain unclear.
Methods :
SSL for image classification task was based on Generic Autodidactic Models (aka Models Genesis) paradigm. It consisted of a series of pretext image restoration tasks, composed of non-linear intensity shift, in- and out-painting, local pixel shuffling, and patch swapping. The SSL pretrained U-net encoder was then fine-tuned for OCT classification task of distinguishing between diabetic macular edema (DME) and retinal vein occlusion (RVO). For retinal layer segmentation task, the SSL involved converting the layer boundaries regressed by a U-net-based DNN into pixel-wise segmentation maps in order to restore the layer content of the input scan.
Results :
A total of 5000 OCT volumes acquired with the Spectralis OCT device were used as an unlabeled data set for SSL. On a downstream classification task for distinguishing between DME and RVO (Fig. 1), the SSL pretrained network reached an AUC > 0.8 with as little as 50 cases, compared to an AUC of 0.65 obtained in a purely supervised setting. When a sufficient number of labeled cases (>500) were available both approaches achieved a similar AUC of 0.95. In a retinal layer segmentation task, SSL pretrained network achieved the same mean absolute error with 25% of the labeled data as the model trained from scratch on the entire labeled data set.
Conclusions :
We evaluated self-supervised methodologies for OCT image analysis on clinically relevant image diagnostic and quantification tasks. Our SSL pretrained models showed effective fine-tuning properties, outperforming the models trained from scratch. This shows a promising step toward obtaining label-efficient foundation models in retinal OCT without the need of large data sets and extensive training efforts.
This abstract was presented at the 2023 ARVO Annual Meeting, held in New Orleans, LA, April 23-27, 2023.