June 2023
Volume 64, Issue 8
Open Access
ARVO Annual Meeting Abstract  |   June 2023
Generalizability of Convolutional Neural Network and Vision Transformer-Based OCT Segmentation Models
Author Affiliations & Notes
  • Adam Pely
    Genentech Inc, South San Francisco, California, United States
  • Zhichao Wu
    Centre for Eye Research Australia Ltd, East Melbourne, Victoria, Australia
    Surgery, The University of Melbourne Faculty of Medicine Dentistry and Health Sciences, Melbourne, Victoria, Australia
  • Theodore Leng
    Stanford University School of Medicine, Stanford, California, United States
  • Simon S. Gao
    Genentech Inc, South San Francisco, California, United States
  • Hao Chen
    Genentech Inc, South San Francisco, California, United States
  • Mohsen Hejrati
    Genentech Inc, South San Francisco, California, United States
  • Miao Zhang
    Genentech Inc, South San Francisco, California, United States
  • Footnotes
    Commercial Relationships   Adam Pely Genentech, Code E (Employment); Zhichao Wu None; Theodore Leng Genentech, Inc., Code C (Consultant/Contractor); Simon Gao Genentech, Inc., Code E (Employment); Hao Chen Genentech, Inc., Code E (Employment); Mohsen Hejrati Genentech, Inc., Code E (Employment); Miao Zhang Genentech, Inc., Code E (Employment)
  • Footnotes
    Support  F. Hoffmann-La Roche, Ltd., Basel, Switzerland, provided financial support for the study and participated in the study design; conduct of the study; collection, management, analysis, and interpretation of the data
Investigative Ophthalmology & Visual Science June 2023, Vol.64, 311. doi:
  • Views
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Adam Pely, Zhichao Wu, Theodore Leng, Simon S. Gao, Hao Chen, Mohsen Hejrati, Miao Zhang; Generalizability of Convolutional Neural Network and Vision Transformer-Based OCT Segmentation Models. Invest. Ophthalmol. Vis. Sci. 2023;64(8):311.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

Purpose : Deep learning segmentation models typically experience performance degradation on data out of their training domain. In OCT image segmentation, a domain change may arise from different OCT scan patterns or devices. We thus aimed to compare the cross-scan-pattern and cross-device performance of a convolutional neural network (CNN) and a vision transformer (ViT)-based OCT segmentation model.

Methods : CNN-based (Deeplab V3+) and ViT-based (Swin-UPerNet) semantic segmentation models were trained and tuned on 20°-wide high-resolution B-scans (N=6,365, 1024 A-scans, 25 frame average) acquired with the Spectralis HRA+OCT (Heidelberg Engineering). These B-scans came from 129 individuals with intermediate age-related macular degeneration (AMD) or geographic atrophy (GA), split at the individual level at an 80:20 ratio. Eight retinal layers and drusen were annotated. Both models were pre-trained on ImageNet. Model performance was then evaluated by the mean intersection over union (mIoU), on test sets of in-domain B-scans (N=882), as well as out-of-domain B-scans from Spectralis 30°-wide scans (N=93; 1024 A-scans, 7 frame average), 20°-wide high-speed scans (N=245; 512 A-scans, 15 frame average), and a mixture of Cirrus HD-OCT scans (Carl Zeiss Meditec; N=119; 6-mm-wide, 512 A-scans, no frame average and 3-mm-wide, 245 A-scans, 4 frame average).

Results : The performance of the CNN- and ViT-based models was comparable on in-domain data (mIoU: 0.75 ± 0.06 for both) and on Spectralis 30°-wide out-of-domain data (mIoU: 0.76 ± 0.04 for both). However, the CNN-based model underperforms ViT based models on out-of-domain Spectralis 20°-wide high-speed scans (0.67 ± 0.22 vs. 0.78 ± 0.06), and on Cirrus scans (0.65 ± 0.15 vs. 0.68 ± 0.06). The performance of the CNN-based model varies significantly across consecutive and similar B-scans in the same volume. This could be due to CNN capturing the undesired high-frequency signals imperceptible to humans.

Conclusions : The ViT-based segmentation model outperforms the CNN-based model on out-of-domain OCT scans and demonstrates better generalizability to different OCT scan patterns and devices, potentially eliminating the need of additional domain adaptation steps.

This abstract was presented at the 2023 ARVO Annual Meeting, held in New Orleans, LA, April 23-27, 2023.

 

Comparison of segmentation performance on in-domain (row 1) and out-of-domain (row 2-5) OCT scans.

Comparison of segmentation performance on in-domain (row 1) and out-of-domain (row 2-5) OCT scans.

 

Segmentation performance on consecutive B-scans from a volume of out-domain Spectralis high speed scan.

Segmentation performance on consecutive B-scans from a volume of out-domain Spectralis high speed scan.

×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×