Abstract
Purpose :
Deep learning segmentation models typically experience performance degradation on data out of their training domain. In OCT image segmentation, a domain change may arise from different OCT scan patterns or devices. We thus aimed to compare the cross-scan-pattern and cross-device performance of a convolutional neural network (CNN) and a vision transformer (ViT)-based OCT segmentation model.
Methods :
CNN-based (Deeplab V3+) and ViT-based (Swin-UPerNet) semantic segmentation models were trained and tuned on 20°-wide high-resolution B-scans (N=6,365, 1024 A-scans, 25 frame average) acquired with the Spectralis HRA+OCT (Heidelberg Engineering). These B-scans came from 129 individuals with intermediate age-related macular degeneration (AMD) or geographic atrophy (GA), split at the individual level at an 80:20 ratio. Eight retinal layers and drusen were annotated. Both models were pre-trained on ImageNet. Model performance was then evaluated by the mean intersection over union (mIoU), on test sets of in-domain B-scans (N=882), as well as out-of-domain B-scans from Spectralis 30°-wide scans (N=93; 1024 A-scans, 7 frame average), 20°-wide high-speed scans (N=245; 512 A-scans, 15 frame average), and a mixture of Cirrus HD-OCT scans (Carl Zeiss Meditec; N=119; 6-mm-wide, 512 A-scans, no frame average and 3-mm-wide, 245 A-scans, 4 frame average).
Results :
The performance of the CNN- and ViT-based models was comparable on in-domain data (mIoU: 0.75 ± 0.06 for both) and on Spectralis 30°-wide out-of-domain data (mIoU: 0.76 ± 0.04 for both). However, the CNN-based model underperforms ViT based models on out-of-domain Spectralis 20°-wide high-speed scans (0.67 ± 0.22 vs. 0.78 ± 0.06), and on Cirrus scans (0.65 ± 0.15 vs. 0.68 ± 0.06). The performance of the CNN-based model varies significantly across consecutive and similar B-scans in the same volume. This could be due to CNN capturing the undesired high-frequency signals imperceptible to humans.
Conclusions :
The ViT-based segmentation model outperforms the CNN-based model on out-of-domain OCT scans and demonstrates better generalizability to different OCT scan patterns and devices, potentially eliminating the need of additional domain adaptation steps.
This abstract was presented at the 2023 ARVO Annual Meeting, held in New Orleans, LA, April 23-27, 2023.