Investigative Ophthalmology & Visual Science Cover Image for Volume 65, Issue 7
June 2024
Volume 65, Issue 7
Open Access
ARVO Annual Meeting Abstract  |   June 2024
Explore Vision-Language Model with Hierarchical Information for Multiple Retinal Disease Recognition
Author Affiliations & Notes
  • Lie Ju
    Monash University, Clayton, Victoria, Australia
    University College London, London, United Kingdom
  • Yukun Zhou
    University College London, London, United Kingdom
    Moorfields Eye Hospital, United Kingdom
  • Peng Xia
    Monash University, Clayton, Victoria, Australia
  • Daniel Alexander
    University College London, London, United Kingdom
  • Pearse Andrew Keane
    Moorfields Eye Hospital, United Kingdom
    University College London, London, United Kingdom
  • Zongyuan Ge
    Monash University, Clayton, Victoria, Australia
  • Footnotes
    Commercial Relationships   Lie Ju None; Yukun Zhou None; Peng Xia None; Daniel Alexander None; Pearse Keane Apellis, Code C (Consultant/Contractor), Allergan, Topcon, Heidelberg Engineering, Novartis, Roche, Bayer, Code F (Financial Support), Big Picture Medical, Code I (Personal Financial Interest); Zongyuan Ge None
  • Footnotes
    Support  N/A
Investigative Ophthalmology & Visual Science June 2024, Vol.65, 1593. doi:
  • Views
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Lie Ju, Yukun Zhou, Peng Xia, Daniel Alexander, Pearse Andrew Keane, Zongyuan Ge; Explore Vision-Language Model with Hierarchical Information for Multiple Retinal Disease Recognition. Invest. Ophthalmol. Vis. Sci. 2024;65(7):1593.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

Purpose : Many existing works proved to be effective for detecting some specific retinal diseases, e.g., DR grading. However, those methods may fail to detect multiple retinal diseases simultaneously. In this study, we build the hierarchical information between retinal diseases in a deep learning model to improve the performance of multiple retinal disease recognition by utilizing a vision-language model.

Methods : This study involved more than one million collected fundus images with 53 kinds of retinal conditions/findings collected from private hospitals over the time of 10 years. We adopt CLIP (https://github.com/openai/CLIP) as the backbone of our vision-language model. Two training strategies were developed for the comparison study: (1) To build the image-text paired inputs, we designed a 3-level hierarchical caption as the inputs of language model for each corresponding fundus image, e.g., “An image of mild non-proliferative diabetic retinopathy (low level), diabetic retinopathy (middle level), vessel (high level).” (2) We used caption without hierarchical information to train the baseline model, e.g., “An image of mild non-proliferative diabetic retinopathy.”. The CLIP was tuned on the privately collected dataset and externally evaluated on the public ODIR dataset for the 12 retinal disease recognition: normal, DR (mild/severe/moderate NPDR or PDR), cataract, glaucoma, hypertensive retinopathy, dry/wet AMD, pathological myopia and other conditions.

Results : Hierarchical training significantly improves detection accuracy compared to the baseline model. Notable enhancements include DR grading, which improved from 93.56% AUC to 96.32% AUC, cataract detection from 97.23% AUC to 97.92% AUC, glaucoma detection from 89.43% AUC to 92.45% AUC, hypertensive retinopathy from 90.51% AUC to 91.37% AUC, dry/wet AMD detection from 95.28% AUC to 97.19% AUC, pathological myopia detection from 94.97% AUC to 95.82% AUC, and other conditions detection from 93.82% AUC to 94.06% AUC.

Conclusions : The integration of vision-language models, utilizing image and paired text information, demonstrates promising performance in multiple retinal disease recognition. The hierarchical caption design further elevates the model's effectiveness. Future research will explore extending this approach to other modalities, such as OCT images, for practical clinical translation.

This abstract was presented at the 2024 ARVO Annual Meeting, held in Seattle, WA, May 5-9, 2024.

 

The illustration of the overall pipeline.

The illustration of the overall pipeline.

×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×