Abstract
Purpose :
In recent years, deep learning models have been developed to diagnose retinal detachment (RD) using fundus images as input. However, patient-reported symptoms also provide valuable information for accurate diagnosis. In clinical practice, both symptoms and image analyses are considered for diagnosis. The purpose of this study was twofold: to construct a system that combines image analysis with patient-reported questionnaires to achieve a diagnosis similar to that of a human clinician and to enhance the overall diagnostic performance of the system.
Methods :
The model developed in this study, as described in Figure, is a modified version of CLIP (Contrastive Language-Image Pre-training) specifically designed for binary classification of RD and normal cases. CLIP is a model that learns distributed representations of images and language, enabling accurate classification even with unseen images.
For training the model, a dataset of 83 cases with RD and 107 cases without RD, collected from Tsukazaki Hospital in Japan, was used. To ensure the reliability of the model's performance, a 5-fold cross-validation technique was employed.
For comparison, two additional models were trained: one only using fundus images and the other only using patient questionnaires. These models were trained using the same dataset and cross-validation technique as the multimodal model.
Results :
The multimodal model achieved an accuracy of 86.8%, outperforming the image-only model (83.2%) and the questionnaire-only model (71.6%). The image-only model had a higher recall rate (86.7%) than the multimodal model (84.3%), but the multimodal model showed significant improvement in precision (85.4%) over the image-only model (77.4%) and the questionnaire-only model (65.3%). As a result of the improvement in precision, the F1-score also increased. The F1-score values for each model are as follows: multimodal model - 84.8%, image-only model - 81.8%, and questionnaire-only model - 69.7%.
Conclusions :
Combining fundus images with patient questionnaires in a multimodal model improves the overall diagnostic performance for RD.
This abstract was presented at the 2024 ARVO Annual Meeting, held in Seattle, WA, May 5-9, 2024.