Investigative Ophthalmology & Visual Science Cover Image for Volume 65, Issue 9
July 2024
Volume 65, Issue 9
Open Access
ARVO Imaging in the Eye Conference Abstract  |   July 2024
Evaluating the practicability of natural-domain and domain specific foundation models for ophthalmic image classification.
Author Affiliations & Notes
  • Gabor Mark Somfai
    Department of Ophthalmology, Stadtspital Zurich Triemli, Zurich, Zürich, Switzerland
    Ophthalmology, Semmelweis Egyetem, Budapest, Budapest, Hungary
  • Jay Rodney Toby Zoellin
    Department of Ophthalmology, Stadtspital Zurich Triemli, Zurich, Zürich, Switzerland
  • Colin Merk
    Department of Ophthalmology, Stadtspital Zurich Triemli, Zurich, Zürich, Switzerland
  • Mischa Buob
    Department of Ophthalmology, Stadtspital Zurich Triemli, Zurich, Zürich, Switzerland
  • Samuel Giesser
    Department of Ophthalmology, Stadtspital Zurich Triemli, Zurich, Zürich, Switzerland
  • Amr Saad
    Department of Ophthalmology, Stadtspital Zurich Triemli, Zurich, Zürich, Switzerland
  • Tahm Spitznagel
    Department of Ophthalmology, Stadtspital Zurich Triemli, Zurich, Zürich, Switzerland
  • Ferhat Turgut
    Department of Ophthalmology, Stadtspital Zurich Triemli, Zurich, Zürich, Switzerland
    Gutblick Research, Switzerland
  • Matthias Becker
    Department of Ophthalmology, Stadtspital Zurich Triemli, Zurich, Zürich, Switzerland
    Gutblick Research, Switzerland
  • Footnotes
    Commercial Relationships   Gabor Mark Somfai, None; Jay Zoellin, None; Colin Merk, None; Mischa Buob, None; Samuel Giesser, None; Amr Saad, None; Tahm Spitznagel, None; Ferhat Turgut, None; Matthias Becker, None
  • Footnotes
    Support  None
Investigative Ophthalmology & Visual Science July 2024, Vol.65, PB0022. doi:
  • Views
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Gabor Mark Somfai, Jay Rodney Toby Zoellin, Colin Merk, Mischa Buob, Samuel Giesser, Amr Saad, Tahm Spitznagel, Ferhat Turgut, Matthias Becker; Evaluating the practicability of natural-domain and domain specific foundation models for ophthalmic image classification. . Invest. Ophthalmol. Vis. Sci. 2024;65(9):PB0022.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

Purpose : Deploying deep learning models for medical imaging in clinical settings is often challenging due to their limited generalizability. In this study, we aim to evaluate the fine-tuning of two pre-trained foundation models (natural domain: DINOv2 and domain specific: RETFound) as potential solutions to this problem. Our research focuses on investigating the generalizability, quantity of images needed, and computational requirements of these adapted models for the downstream tasks of ophthalmologic disease classification from color fundus photographs (CFPs).

Methods : We utilized 5922 CFPs from publicly available datasets with corresponding DR-stage labels, using a training, validation and test split of 70:15:15. For this study, we evaluated the four adapted models from DINOv2 and RETFound showing the most promising performance in our previous research. A detailed tabulation of these models can be found in Table 1.
In order to evaluate image quantity requirements for downstream model adaptation, a few shot study was performed. Computational requirements for each model were reported. Cross-evaluation for generalizability estimation was performed by training models on DR datasets distinct from the testing datasets and reported as average QKappa in Table 1.

Results : Amongst all models investigated, our CFP-BE-DINOv2 model showed the best transfer across datasets (average QKappa of 0.605). Adapting DINOv2 resulted in smaller models compared to RETFound. All models evaluated can be trained on consumer-grade GPUs. The computational requirements for training each model on our largest dataset (APTOS, 3992 images) is depicted in Table 1. As depicted in Figure 1, all models reached QKappa thresholds of 0.7 when trained with only 32 samples per class on APTOS. Among all models, the frozen DINOv2 model yielded the best performance when trained on 2-64 sample images, outperforming the unfrozen RETFound model (QKappa of 0.814 and 0.751 respectively, with 32 samples per class).

Conclusions : Our CFP-BE-DINOv2 model resulted in the best generalizability across distinct datasets. Our few shot study suggests that training the linear layer of DINOv2 (frozen DINOv2) may suffice, facilitating model development with minimal computational requirements.

This abstract was presented at the 2024 ARVO Imaging in the Eye Conference, held in Seattle, WA, May 4, 2024.

 

Figure 1: Few shot study results.

Figure 1: Few shot study results.

 

Table 1: Depicting model information, computational requirements and cross-evaluation performance.

Table 1: Depicting model information, computational requirements and cross-evaluation performance.

×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×