Investigative Ophthalmology & Visual Science Cover Image for Volume 65, Issue 7
June 2024
Volume 65, Issue 7
Open Access
ARVO Annual Meeting Abstract  |   June 2024
ChatGPT and Ocular Ultrasound: Pioneering Enhanced Visual Question Answering in Medical Diagnostics
Author Affiliations & Notes
  • Fan Song
    School of Optometry, The Hong Kong Polytechnic University, Hong Kong, China
  • Weiyi Zhang
    School of Optometry, The Hong Kong Polytechnic University, Hong Kong, China
  • Yao Li
    University of Waterloo, Waterloo, Ontario, Canada
  • Yanxian Chen
    School of Optometry, The Hong Kong Polytechnic University, Hong Kong, China
  • Yingfeng Zheng
    Sun Yat-Sen University, Guangzhou, Guangdong, China
  • Wei Peng
    Huawei Technologies Co Ltd, Shenzhen, Guangdong, China
  • Danli Shi
    School of Optometry, The Hong Kong Polytechnic University, Hong Kong, China
  • Mingguang He
    School of Optometry, The Hong Kong Polytechnic University, Hong Kong, China
  • Footnotes
    Commercial Relationships   Fan Song None; Weiyi Zhang None; Yao Li None; Yanxian Chen None; Yingfeng Zheng None; Wei Peng None; Danli Shi None; Mingguang He None
  • Footnotes
    Support   Global STEM Professorship Scheme (P0046113)
Investigative Ophthalmology & Visual Science June 2024, Vol.65, 2364. doi:
  • Views
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Fan Song, Weiyi Zhang, Yao Li, Yanxian Chen, Yingfeng Zheng, Wei Peng, Danli Shi, Mingguang He; ChatGPT and Ocular Ultrasound: Pioneering Enhanced Visual Question Answering in Medical Diagnostics. Invest. Ophthalmol. Vis. Sci. 2024;65(7):2364.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

Purpose : Generative Pre-training Transformer (GPT)-based large language models have made significant advancements in various domains, but their ability to effectively process medical images is limited. Applying ChatGPT in the realm of clinical ophthalmology could reduce the workload and improve patient care. We aimed to develop an ocular ultrasound visual question answering (VQA) model with the help of ChatGPT, for facilitating the interpretation of ultrasound reports.

Methods : We collected information from ocular ultrasound reports written by experienced physicians and utilized ChatGPT to create question-answer (QA) pairs in various question types. The QA pairs underwent quality control filtering and were subsequently employed to fine-tune a multi-modal transformer model for performing VQA and report generation. The performance of the VQA was evaluated using language-based metrics such as the Bilingual Evaluation Understudy (BLEU), the Consensus-based Image Description Evaluation (CIDEr), the Recall-Oriented Understudy for Gisting Evaluation - Longest Common Subsequence (ROUGE-L), and the Semantic Propositional Image Caption Evaluation (SPICE), as well as classification metrics on question types and disease conditions. In addition, one ophthalmologist subjectively reviewed 100 images from the test set to improve the model’s performance.

Results : Our study included 6,073 ocular ultrasound reports from distinct patients, covering 42 disease-related conditions. ChatGPT produced 101,417 QA pairs derived from the reports to refine our ultrasound-VQA model, which achieved BLEU scores (1-4) of 0.58, 0.54, 0.52, 0.5, ROUGE of 0.57, SPICE of 0.51, and CIDEr of 2.54. The accuracies for binary-choice and multiple QAs were 0.89 and 0.77, respectively. Manual assessment of 100 images (2185 QA pairs) identified 6 (0.3%) QA pairs with unrelated information, 146 (6.7%) with apparent factual errors, and 32 (1.5%) containing insufficient information to provide an answer.

Conclusions : This study demonstrated the effectiveness and potential of utilizing ChatGPT for VQA tasks on ultrasound images. By leveraging generative learning, vision-language pretraining, and consideration of limitations, we pave the way for unveiling the feasibility of large language models for medical image analysis.

This abstract was presented at the 2024 ARVO Annual Meeting, held in Seattle, WA, May 5-9, 2024.

 

×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×