Investigative Ophthalmology & Visual Science Cover Image for Volume 65, Issue 7
June 2024
Volume 65, Issue 7
Open Access
ARVO Annual Meeting Abstract  |   June 2024
Evaluating the efficacy of a large language model in screening ophthalmology articles for systematic reviews
Author Affiliations & Notes
  • Erkin Ötles
    University of Michigan, Ann Arbor, Michigan, United States
  • Rithambara Ramachandran
    University of Michigan, Ann Arbor, Michigan, United States
  • Ming-Chen Lu
    University of Michigan, Ann Arbor, Michigan, United States
  • Paula Anne Newman-Casey
    University of Michigan, Ann Arbor, Michigan, United States
  • Footnotes
    Commercial Relationships   Erkin Ötles HTD Health, Fourier Health, Code C (Consultant/Contractor), Peers Health, Code O (Owner), Patent pending for the University of Michigan for an artificial intelligence based approach for the dynamic prediction of health states for patients with occupational injuries, Code P (Patent); Rithambara Ramachandran None; Ming-Chen Lu None; Paula Anne Newman-Casey None
  • Footnotes
    Support  E.Ö. was supported by the National Institutes of Health (grant no. T32GM007863)
Investigative Ophthalmology & Visual Science June 2024, Vol.65, 344. doi:
  • Views
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Erkin Ötles, Rithambara Ramachandran, Ming-Chen Lu, Paula Anne Newman-Casey; Evaluating the efficacy of a large language model in screening ophthalmology articles for systematic reviews. Invest. Ophthalmol. Vis. Sci. 2024;65(7):344.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

Purpose : The accuracy and efficiency of article screening are crucial in conducting high-quality systematic reviews. This study assesses the utility of ChatGPT-3.5, a large language model (LLM), in streamlining this screening process.

Methods : Two reviewers (RR and EÖ) reviewed a large corpus of ophthalmology articles returned from five online databases (MEDLINE, Embase, CINAHL Complete, Scopus, and Web of Science) for a prespecified query. Each article was assessed using its title and abstract to determine its relevance to a systematic review given predetermined inclusion and exclusion criteria. All interrater discrepancies were harmonized by discussion to render a final ‘relevant’ or ‘irrelevant’ label. These articles were processed using an LLM, the ChatGPT-3.5 API (OpenAI, gpt-3.5-turbo-0613, accessed November 28, 2023). This processing involved a custom Python script that sent a structured message to the LLM for each article (each message was sent as a new chat to ensure LLM state independence); this pipeline is depicted in Figure 1. These structured messages included inclusion and exclusion criteria, the articles' titles, abstracts, publication years, and journal names, followed by the request to classify the paper. The LLM classified each article as 'relevant' or 'irrelevant.' The LLM’s performance was evaluated using sensitivity, specificity, and accuracy compared to the gold-standard human assessment. Additionally, total processing time and per article time were collected.

Results : Of the 547 articles evaluated, 124 (22.67%) were labeled as 'relevant' by the human raters and 419 (76.60%) by the LLM. The LLM achieved a sensitivity of 98.39%, a specificity of 29.79%, and an accuracy of 45.34%; the confusion matrix is shown in Figure 2. The total LLM processing time was 37.2 minutes (average per article was 4.08 seconds with a standard deviation of 44.43 seconds).

Conclusions : The LLM demonstrated a high sensitivity, suggesting its potential utility in identifying relevant literature for systematic reviews. This sensitivity, combined with the speed, indicates that LLMs could be used as an initial screening tool for article for inclusion. However, its lower specificity and accuracy highlight the need for human oversight. This study underscores the promise and current limitations of using LLMs in academic article screening, suggesting a complementary role alongside human expertise.

This abstract was presented at the 2024 ARVO Annual Meeting, held in Seattle, WA, May 5-9, 2024.

 

 

×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×