Investigative Ophthalmology & Visual Science Cover Image for Volume 65, Issue 7
June 2024
Volume 65, Issue 7
Open Access
ARVO Annual Meeting Abstract  |   June 2024
Clinical Dataset Structure: A Universal Standard for Structuring Clinical Research Data and Metadata
Author Affiliations & Notes
  • Bhavesh Patel
    FAIR Data Innovations Hub, California Medical Innovations Institute, San Diego, California, United States
  • Sanjay Soundarajan
    FAIR Data Innovations Hub, California Medical Innovations Institute, San Diego, California, United States
  • Aydan Gasimova
    FAIR Data Innovations Hub, California Medical Innovations Institute, San Diego, California, United States
  • Nayoon Gim
    Department of Ophthalmology, University of Washington, Seattle, Washington, United States
    Department of Bioengineering, University of Washington, Seattle, Washington, United States
  • Jamie Shaffer
    Department of Ophthalmology, University of Washington, Seattle, Washington, United States
    The Roger and Angie Karalis Johnson Retina Center, Seattle, Washington, United States
  • Aaron Y Lee
    Department of Ophthalmology, University of Washington, Seattle, Washington, United States
    The Roger and Angie Karalis Johnson Retina Center, Seattle, Washington, United States
  • Footnotes
    Commercial Relationships   Bhavesh Patel None; Sanjay Soundarajan None; Aydan Gasimova None; Nayoon Gim None; Jamie Shaffer None; Aaron Lee Genentech, Verana Health, Code C (Consultant/Contractor), US Food and Drug Administration, Code E (Employment), Santen, Carl Zeiss Meditec, Novartis, Code F (Financial Support), Topcon , Code R (Recipient)
  • Footnotes
    Support  NIH grant OT2OD032644
Investigative Ophthalmology & Visual Science June 2024, Vol.65, 2418. doi:
  • Views
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Bhavesh Patel, Sanjay Soundarajan, Aydan Gasimova, Nayoon Gim, Jamie Shaffer, Aaron Y Lee; Clinical Dataset Structure: A Universal Standard for Structuring Clinical Research Data and Metadata. Invest. Ophthalmol. Vis. Sci. 2024;65(7):2418.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

Purpose : During clinical research studies, multiple modalities of data are typically collected such as surveys, vitals, and eye images. There is currently no consensus on how to structure such multimodal data into a consistently organized dataset that is easily reusable by humans and machines in line with the FAIR (Findable, Accessible, Interoperable, Reusable) Principles. We addressed this issue in the Artificial Intelligence Ready and Equitable Atlas for Diabetes Insights (AI-READI) project by developing the Clinical Dataset Structure (CDS), a standard approach for organizing multimodal clinical research data and metadata at the root level.

Methods : We first reviewed modality-specific standard structures such as the Brain Imaging Data Structure (BIDS) for neuroimaging data and the SPARC Data Structure (SDS) for neuromodulation-related data to learn from their approach. We simultaneously reviewed popular metadata schemas such as the DataCite and ClinicalTrials.gov schemas to identify critical metadata elements and ways to structure them. We also reviewed artificial intelligence (AI) and machine learning (ML) specific metadata standards such as datasheet for dataset and data card. Using all that information, we established the first version of the CDS. We applied the CDS to the pilot data from the AI-READI study and evaluated it for alignment with FAIR Principles relevant to data and metadata structures.

Results : The CDS mandates to organize data into one folder per datatype, named as per a set convention, where applicable standard structure must then be followed within each folder. The CDS requires several human-friendly metadata files to be included at the root-level such as a README file that provides an overview, a standard datasheet, and a LICENSE file with reuse terms. Machine-friendly metadata files are specified as well such as a DataCite-compliant dataset description file and a ClinicalTrials.gov-inspired study description file. Evaluation of the AI-READI pilot dataset showed that the CDS makes datasets compliant with all the relevant elements from the FAIR Principles.

Conclusions : The CDS provides a simple and intuitive way to organize clinical research data and metadata at the root level such as to optimize their interoperability and reusability, especially for AI/ML applications.

This abstract was presented at the 2024 ARVO Annual Meeting, held in Seattle, WA, May 5-9, 2024.

 

Illustration of a sample dataset organized at the root level according to the CDS.

Illustration of a sample dataset organized at the root level according to the CDS.

×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×