Investigative Ophthalmology & Visual Science Cover Image for Volume 65, Issue 7
June 2024
Volume 65, Issue 7
Open Access
ARVO Annual Meeting Abstract  |   June 2024
Slit-lamp-GPT: Application of Large Language Models for Slit Lamp Image Report Generation and Question Answering
Author Affiliations & Notes
  • Ziwei Zhao
    School of Optometry,, The Hong Kong Polytechnic University, Hong Kong, Hong Kong
  • Weiyi Zhang
    School of Optometry,, The Hong Kong Polytechnic University, Hong Kong, Hong Kong
  • Xiaolan Chen
    Sun Yat-Sen University Zhongshan Ophthalmic Center State Key Laboratory of Ophthalmology, Guangzhou, Guangdong, China
  • Fan Song
    School of Optometry,, The Hong Kong Polytechnic University, Hong Kong, Hong Kong
  • James Gunasegaram
    Monash University, Clayton, Victoria, Australia
  • Wenyong Huang
    Sun Yat-Sen University Zhongshan Ophthalmic Center State Key Laboratory of Ophthalmology, Guangzhou, Guangdong, China
  • Danli Shi
    School of Optometry,, The Hong Kong Polytechnic University, Hong Kong, Hong Kong
  • Mingguang He
    School of Optometry,, The Hong Kong Polytechnic University, Hong Kong, Hong Kong
    Sun Yat-Sen University Zhongshan Ophthalmic Center State Key Laboratory of Ophthalmology, Guangzhou, Guangdong, China
  • Footnotes
    Commercial Relationships   Ziwei Zhao None; Weiyi Zhang None; Xiaolan Chen None; Fan Song None; James Gunasegaram None; Wenyong Huang None; Danli Shi None; Mingguang He None
  • Footnotes
    Support  Global STEM Professorship Scheme, number: P0046113
Investigative Ophthalmology & Visual Science June 2024, Vol.65, 2368. doi:
  • Views
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Ziwei Zhao, Weiyi Zhang, Xiaolan Chen, Fan Song, James Gunasegaram, Wenyong Huang, Danli Shi, Mingguang He; Slit-lamp-GPT: Application of Large Language Models for Slit Lamp Image Report Generation and Question Answering. Invest. Ophthalmol. Vis. Sci. 2024;65(7):2368.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

Purpose : To finetune a multi-modal, transformer-based model for generating medical reports from slit lamp images and develop a subquent question answering (QA) system using Llama2. We term this entire process slit-lamp-GPT (Generative Pre-training Transformer).

Methods : Our research utilized a dataset of 25,051 slit-lamp images from 3,409 participants, paired with their corresponding physician-created medical reports. We used this data, which was divided into training, validation, and test sets, to finetune the Bootstrapping Language-Image Pre-training (BLIP) framework towards report generation. The generated text reports and human-posed questions were then inputted into Llama2 for interactive question answering. We evaluated performance using qualitative metrics (including BLEU, CIDEr, ROUGE-L, SPICE, accuracy, sensitivity, specificity, precision, and F1-score) and the subjective assessments from two experienced ophthalmologists who rated the outputs on a 1-3 scale (1 indicated high quality).

Results : A total of 50 conditions related to diseases or post-operative complications through keyword matching in initial reports were identified. The refined slit-lamp-GPT model exhibited BLEU scores (1-4) of 0.67, 0.66, 0.65, and 0.65, respectively, with a CIDEr score of 3.24, a ROUGE score of 0.61, and a SPICE score of 0.37. The most frequently identified conditions were cataract (unspecific categorization) (22.9%), age-related cataract (22.0%), and conjunctival concretion (13.1%). Disease classification metrics showed an overall accuracy of 0.82 and an F1 score of 0.64, with high accuracies (≥0.9) for identifying intraocular lens, conjunctivitis (unspecific categorization), and chronic conjunctivitis, and high F1 scores (≥0.9) for cataract and age-related cataract. A high level of agreement was noted between the two ophthalmologists during the quality assessment of 100 reports with scores of 1.36 for both completeness and correctness. Consistency was also observed in an interactive question answering scenario involving 300 generated answers, with scores of 1.33, 1.14, and 1.15 for completeness, correctness, and possible harm, respectively.

Conclusions : This pioneering study introduces the slit-lamp-GPT model for report generation and question answering, highlighting the potential of large language models to assist ophthalmologists and patients.

This abstract was presented at the 2024 ARVO Annual Meeting, held in Seattle, WA, May 5-9, 2024.

 

×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×