Investigative Ophthalmology & Visual Science Cover Image for Volume 65, Issue 9
July 2024
Volume 65, Issue 9
Open Access
ARVO Imaging in the Eye Conference Abstract  |   July 2024
Advancing Diabetic Retinopathy Staging with DINOv2: Novel Approaches in Transformer Architecture and Performance Benchmarking
Author Affiliations & Notes
  • Jay Rodney Toby Zoellin
    Ophthalmology, Stadtspital Zurich Triemli, Zurich, Zürich, Switzerland
    Werner H Spross-Stiftung, Zurich, Switzerland
  • Colin Merk
    Ophthalmology, Stadtspital Zurich Triemli, Zurich, Zürich, Switzerland
    Department of Ophthalmology, Semmelweis Egyetem, Budapest, Budapest, Hungary
  • Mischa Buob
    Ophthalmology, Stadtspital Zurich Triemli, Zurich, Zürich, Switzerland
  • Samuel Giesser
    Ophthalmology, Stadtspital Zurich Triemli, Zurich, Zürich, Switzerland
    Werner H Spross-Stiftung, Zurich, Switzerland
  • Amr Saad
    Ophthalmology, Stadtspital Zurich Triemli, Zurich, Zürich, Switzerland
    Werner H Spross-Stiftung, Zurich, Switzerland
  • Tahm Spitznagel
    Ophthalmology, Stadtspital Zurich Triemli, Zurich, Zürich, Switzerland
    Werner H Spross-Stiftung, Zurich, Switzerland
  • Ferhat Turgut
    Ophthalmology, Stadtspital Zurich Triemli, Zurich, Zürich, Switzerland
    Gutblick Research, Switzerland
  • Matthias Becker
    Ophthalmology, Stadtspital Zurich Triemli, Zurich, Zürich, Switzerland
    Werner H Spross-Stiftung, Zurich, Switzerland
  • Gábor Somfai
    Ophthalmology, Stadtspital Zurich Triemli, Zurich, Zürich, Switzerland
    Department of Ophthalmology, Semmelweis Egyetem, Budapest, Budapest, Hungary
  • Footnotes
    Commercial Relationships   Jay Zoellin, None; Colin Merk, None; Mischa Buob, None; Samuel Giesser, None; Amr Saad, None; Tahm Spitznagel, None; Ferhat Turgut, None; Matthias Becker, None; Gábor Somfai, None
  • Footnotes
    Support  None
Investigative Ophthalmology & Visual Science July 2024, Vol.65, PB0059. doi:
  • Views
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Jay Rodney Toby Zoellin, Colin Merk, Mischa Buob, Samuel Giesser, Amr Saad, Tahm Spitznagel, Ferhat Turgut, Matthias Becker, Gábor Somfai; Advancing Diabetic Retinopathy Staging with DINOv2: Novel Approaches in Transformer Architecture and Performance Benchmarking. Invest. Ophthalmol. Vis. Sci. 2024;65(9):PB0059.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

Purpose : Our study investigates the staging of diabetic retinopathy (DR) severity utilizing the DINOv2 vision transformer pre-trained on natural images. We propose novel models by fine-tuning DINOv2 for DR classification from color fundus photographs (CFPs) and introduce a novel block expansion strategy (BE) inspired by advancements in the LLM field and benchmark our models against a domain-specific foundation model in ophthalmology (RETFound).

Methods : We employed publicly available datasets with a training, validation, and test split of 70:15:15 across all datasets. For self-supervised (SSL) pre-training on CFPs, images from the AIROGS and Kaggle EyePACS DR challenge were utilized.
We proposed 6 models leveraging the two foundation models. Starting from a CFP foundational model (RETFound) and a model in the natural domain (DINOv2), we fine-tuned them (by training either only the last layer (frozen) or the entire model (unfrozen)) on the given DR datasets. Further, we investigated the impact of fine-tuning DINOv2 on CFP images using the SSL proposed by DINO. In the BE strategy, individual transformer blocks are duplicated and set to 0, with only new blocks being fine-tuned. A description of all models can be found in Table 1.

Results : While training only the last linear layer in DINOv2 (frozen DINOv2, best QKappa of 0.882) resulted in similar performance to training the BB (unfrozen DINOv2, best QKappa of 0.904), for RETFound freezing the BB diminished the performance massively (best QKappa of 0.908 and 0.672 for frozen and unfrozen BB, respectively). Pre-training models on CFPs and freezing the BB during supervised fine-tuning resulted in similar performance to frozen DINOv2 in our study (best QKappa of 0.857 and 0.872 for CFP-DINOv2 and CFP-BE-DINOv2, respectively). For a single dataset our BE strategy massively improved performance. Results across all models and datasets are depicted in Table 2.

Conclusions : ViTs can effectively be adapted for ophthalmic image classification tasks, with our results suggesting that DINOv2 provides general good-performing features for DR grading out-of-the-box, while RETFound requires additional training.

This abstract was presented at the 2024 ARVO Imaging in the Eye Conference, held in Seattle, WA, May 4, 2024.

 

Table 1: Depicts the model names for given architectures, pre-training and training methods and backbone (BB) states.

Table 1: Depicts the model names for given architectures, pre-training and training methods and backbone (BB) states.

 

Table 2: Performance of our models (QKappa and rDR (referrable DR) accuracy (across all DR datasets employed.

Table 2: Performance of our models (QKappa and rDR (referrable DR) accuracy (across all DR datasets employed.

×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×