Abstract
Purpose :
The ability to understand and affect the course of complex, multi-system diseases has been limited by a lack of well designed, high quality, large, ethically sourced, and inclusive multimodal datasets. Our grand challenge in AI-READI (aireadi.org) is to generate a dataset facilitating artificial intelligence/machine learning (AI/ML) approaches for insights into type 2 diabetes mellitus (T2DM). Our project’s theme is salutogenesis, a concept focusing on factors responsible for health and well-being, rather than disease pathogenesis. The project itself is hypothesis-agnostic. This dataset will enable researchers to engage in a myriad of AI/ML applications to evaluate hypotheses about T2DM.
Methods :
Our approach is to generate a dataset on 4000 persons > 40 yrs of age that is balanced for 4 racial/ethnic groups (Asians, Blacks, Hispanics, Whites), both sexes, and 4 categories of T2DM (no diabetes, prediabetes/lifestyle-controlled, controlled by oral-medications/non-insulin injectables, insulin-dependent). Data collection sites include Birmingham AL, San Diego CA, and Seattle WA. The variable domains of the dataset are diverse, encompassing many biomedical and behavioral aspects of health and functioning often impacted in T2DM, including retinal imaging, vision, cognition, body mass index, blood/urine testing, physical activity tracking, EKG, continuous glucose monitoring, environmental sensor exposure, social determinants of health and more (see Figure 1). In addition, whole genome sequencing will be performed on biospecimens. Blood derivatives (including serum, plasma, buffy coats, RNA, DNA and Peripheral Blood Mononuclear Cells) will be banked and available to researchers for future proteomics, metabolomics, and other research.
Results :
Data will be standardized and optimized for AI/ML research and available to researchers through either a public access or a controlled access database, depending on the variables requested. AI-READI data will be released on an ongoing basis with the first release anticipated in 2024. Data collection is taking place from July 2023 through August 2026.
Conclusions :
We aim to create a model for developing a diverse and representative dataset optimized for AI/ML research and accessible to researchers. Use of this flagship dataset may uncover new insights into T2DM, including diabetic eye disease.
This abstract was presented at the 2024 ARVO Annual Meeting, held in Seattle, WA, May 5-9, 2024.