Investigative Ophthalmology & Visual Science Cover Image for Volume 65, Issue 7
June 2024
Volume 65, Issue 7
Open Access
ARVO Annual Meeting Abstract  |   June 2024
Multimodal AI diagnosis of retinal detachment incorporating fundus images and patient questionnaires
Author Affiliations & Notes
  • Naoyuki Yonemaru
    CRESCO LTD., Japan
  • Hitoshi Tabuchi
    Tsukazaki Byoin, Himeji, Hyogo, Japan
    Department of Technology and Design Thinking for Medicine, Hiroshima University, Hiroshima, Japan
  • Hodaka Deguchi
    Tsukazaki Byoin, Himeji, Hyogo, Japan
  • Yuto Omi
    Tsukazaki Byoin, Himeji, Hyogo, Japan
  • Mao Tanabe
    Tsukazaki Byoin, Himeji, Hyogo, Japan
  • Naofumi Ishitobi
    Tsukazaki Byoin, Himeji, Hyogo, Japan
  • Hisashi Maruyama
    CRESCO LTD., Japan
  • Yuji Ayatsuka
    CRESCO LTD., Japan
  • Footnotes
    Commercial Relationships   Naoyuki Yonemaru None; Hitoshi Tabuchi Thinkout LTD, Code E (Employment), GLORY LTD. TOPCON CORPORATION, CRESCO LTD, OLBA Healthcare Holdings Ltd. Tomey corporation, HOYA Corporation, Code F (Financial Support), Japanese Patent No.6419055,6695171,7139548,7339483,7304508,7060854, Code P (Patent); Hodaka Deguchi None; Yuto Omi None; Mao Tanabe None; Naofumi Ishitobi None; Hisashi Maruyama None; Yuji Ayatsuka None
  • Footnotes
    Support  None
Investigative Ophthalmology & Visual Science June 2024, Vol.65, 2321. doi:
  • Views
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Naoyuki Yonemaru, Hitoshi Tabuchi, Hodaka Deguchi, Yuto Omi, Mao Tanabe, Naofumi Ishitobi, Hisashi Maruyama, Yuji Ayatsuka; Multimodal AI diagnosis of retinal detachment incorporating fundus images and patient questionnaires. Invest. Ophthalmol. Vis. Sci. 2024;65(7):2321.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

Purpose : In recent years, deep learning models have been developed to diagnose retinal detachment (RD) using fundus images as input. However, patient-reported symptoms also provide valuable information for accurate diagnosis. In clinical practice, both symptoms and image analyses are considered for diagnosis. The purpose of this study was twofold: to construct a system that combines image analysis with patient-reported questionnaires to achieve a diagnosis similar to that of a human clinician and to enhance the overall diagnostic performance of the system.

Methods : The model developed in this study, as described in Figure, is a modified version of CLIP (Contrastive Language-Image Pre-training) specifically designed for binary classification of RD and normal cases. CLIP is a model that learns distributed representations of images and language, enabling accurate classification even with unseen images.
For training the model, a dataset of 83 cases with RD and 107 cases without RD, collected from Tsukazaki Hospital in Japan, was used. To ensure the reliability of the model's performance, a 5-fold cross-validation technique was employed.
For comparison, two additional models were trained: one only using fundus images and the other only using patient questionnaires. These models were trained using the same dataset and cross-validation technique as the multimodal model.

Results : The multimodal model achieved an accuracy of 86.8%, outperforming the image-only model (83.2%) and the questionnaire-only model (71.6%). The image-only model had a higher recall rate (86.7%) than the multimodal model (84.3%), but the multimodal model showed significant improvement in precision (85.4%) over the image-only model (77.4%) and the questionnaire-only model (65.3%). As a result of the improvement in precision, the F1-score also increased. The F1-score values for each model are as follows: multimodal model - 84.8%, image-only model - 81.8%, and questionnaire-only model - 69.7%.

Conclusions : Combining fundus images with patient questionnaires in a multimodal model improves the overall diagnostic performance for RD.

This abstract was presented at the 2024 ARVO Annual Meeting, held in Seattle, WA, May 5-9, 2024.

 

An image and a questionnaire are inputted into an image encoder and a text encoder, respectively. Extracted features from each encoder are then concatenated and classified as RD or normal cases by a binary classifier.

An image and a questionnaire are inputted into an image encoder and a text encoder, respectively. Extracted features from each encoder are then concatenated and classified as RD or normal cases by a binary classifier.

×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×