Investigative Ophthalmology & Visual Science Cover Image for Volume 65, Issue 7
June 2024
Volume 65, Issue 7
Open Access
ARVO Annual Meeting Abstract  |   June 2024
To assess the potential and capabilities of large language models (LLMs) trained on in-domain ophthalmology data.
Author Affiliations & Notes
  • Minhaj Nur Alam
    Electrical & Computer Engineering, UNC Charlotte, Charlotte, North Carolina, United States
  • Tania Haghighi
    Bahai Institute for Higher Education, Iran (the Islamic Republic of)
  • Sina Gholami
    Electrical & Computer Engineering, UNC Charlotte, Charlotte, North Carolina, United States
  • Theodore Leng
    Stanford University School of Medicine, Stanford, California, United States
  • Footnotes
    Commercial Relationships   Minhaj Nur Alam None; Tania Haghighi None; Sina Gholami None; Theodore Leng None
  • Footnotes
    Support  None
Investigative Ophthalmology & Visual Science June 2024, Vol.65, 5656. doi:
  • Views
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Minhaj Nur Alam, Tania Haghighi, Sina Gholami, Theodore Leng; To assess the potential and capabilities of large language models (LLMs) trained on in-domain ophthalmology data.. Invest. Ophthalmol. Vis. Sci. 2024;65(7):5656.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

Purpose : To assess the potential and capabilities of large language models (LLMs) trained on in-domain ophthalmology data

Methods : Training LLMs within medical domains has significantly enhanced their performance, leading to more accurate and reliable question-answering systems essential for supporting clinical decision-making and educating patients. However, few studies have investigated the performance of LLMs in the ophthalmic domain, despite the growing interest in applying LLMs to various medical tasks. This study demonstrates two LLMs (Mistral and Llama) which are fine-tuned on a limited in-domain ophthalmic raw data (Journal abstracts, eyeWiki, articles within the ophthalmology category on Wikipedia, and textbooks) for a limited number of epochs, with Mistral trained for 4k steps and Llama for 12400 steps. To overcome resource limitations in fine-tuning, we utilized the QLoRA method, which trains a portion of the total 7b parameters of the model (9M for Mistral and 12M for llama). Then, both LLMs were compared with OpenAI’s GPT-4 model (1.7T parameters) using two distinct test sets: a set of expert-designed ophthalmic questions and a subset of the MedQA dataset curated using ophthalmology keywords. The evaluation results were quantitatively (accuracy) and qualitatively (reviewed by an ophthalmologist and GPT-4) recorded.

Results : In our model assessment, an ophthalmologist highlighted GPT-4's superior performance over Llama, while Llama outperformed Mistral when presented with expert-designed questions. Employing GPT-4 to evaluate models on the same test set based on criteria such as comprehensiveness, correctness, medical terminology usage, and clarity on a scale from 1 to 10 reveals that GPT-4 has a marginal advantage, scoring an average of 8.2, compared to llama’s average score of 7.825. When subjected to the MedQA test set, the fine-tuned models (Mistral: 0.35, Llama: 0.25) exhibited a slight improvement in accuracy over the original models (Mistral: 0.34, Llama: 0.22), whereas GPT-4 achieved the highest accuracy of 0.68.

Conclusions : Fine-tunning LLMs with limited parameter tunning with the QLoRA method not only showcased the effectiveness of these models but also underscored their adaptability in resource-constrained scenarios, highlighting their practical utility even with limited training time and resources.

This abstract was presented at the 2024 ARVO Annual Meeting, held in Seattle, WA, May 5-9, 2024.

×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×