Investigative Ophthalmology & Visual Science Cover Image for Volume 65, Issue 7
June 2024
Volume 65, Issue 7
Open Access
ARVO Annual Meeting Abstract  |   June 2024
DIViT-DR: Deep Learning Dual-Image Vision Transformer to Detect Diabetic Retinopathy from Stereoscopic Fundus Image Pairs
Author Affiliations & Notes
  • Justin Huynh
    Carle Illinois College of Medicine, Urbana, Illinois, United States
  • Footnotes
    Commercial Relationships   Justin Huynh None
  • Footnotes
    Support  None
Investigative Ophthalmology & Visual Science June 2024, Vol.65, 5669. doi:
  • Views
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Justin Huynh; DIViT-DR: Deep Learning Dual-Image Vision Transformer to Detect Diabetic Retinopathy from Stereoscopic Fundus Image Pairs. Invest. Ophthalmol. Vis. Sci. 2024;65(7):5669.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

Purpose : Diabetic retinopathy (DR), a leading cause of blindness worldwide, is a complication of diabetes mellitus, a systemic condition that affects many organ systems throughout the body. A patient with DR often develops symptoms in both eyes. Accordingly, standard diagnostic workup for DR involves assessment of both eyes by a clinician. Due to the prevalence of DR, many deep learning (DL) based artificial intelligence (AI) models are being developed to serve as screening and diagnostic assistance tools for DR by analyzing fundus photographs or OCT images. While promising, one major limitation of existing models is that, unlike a clinician, they only utilize images from a single eye when making an assessment. This may limit the diagnostic capabilities of such models. To overcome this limication we propose a DL model based on a customized Vision Transformer (ViT) architecture capable of analyzing images from both eyes when making an assessment.

Methods : 18886 retinal funduscopic images from the EYEPACS dataset were utilized for training. Images were grouped into 9443 image pairs consisting of OD and OS eye images from the same patient from the same visit. 9000 image pairs were used for trianing and 443 were used for testing. A base ViT model architecture was customized at the input channel level to support two image inputs at once, by generating two separate attention vectors for each input image and concatenating them inside the model. This custom dual-image ViT model was trained on 9000 image pairs to detect diabetic retinopathy (mild, moderate, or severe) and evaluated on the 443 image pairs. Single-image ResNet50 and ViT models were also trained on the same group of images to compare with the dual-image ViT.

Results : The dual-image ViT achieved an AUC of 0.94 in detecting DR. The single-image ViT achieved an AUC 0.86, while the ResNet50 achieved na AUC of 0.74. Attention vectors from the dual-image model reveal diffuse attention throughout the retina, focusing on hallmark features of DR including microaneurysms and neovascularization. Notably, attention vectors in OD and OS eyes from the same patient were unique, indicating model attention focused on separate features in each eye

Conclusions : The dual-image ViT outperformed single-image models by a large margin, and utilized hallmark features of DR in both eyes to make an assessment.

This abstract was presented at the 2024 ARVO Annual Meeting, held in Seattle, WA, May 5-9, 2024.

 

 

×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×