Abstract
Purpose :
Most variants associated with Age-related Macular Degeneration (AMD) reside in the non-coding region and likely to exert their effect through gene expression regulation. Thus, transcriptome studies in AMD holds great potential to identify genes and pathways that could lead to novel insights into the underlying molecular processes. However, accessing the transcriptome profiles in large disease and non-disease cohorts are challenging, which present significant limitation. Here we apply machine learning methods to uncover gene expression signatures associated with AMD.
Methods :
RNAseq data from 453 donor retina belonging to normal and AMD patients were used to establish and test the model for AMD classification. The data was randomly split into 70% training data and 30% test data and models were tuned using Bayesian Optimization to find the best model parameters. 100 iterations were performed, and average model performance was measured. AUROC, sensitivity, specificity, accuracy, and F-1 scores were used to compare the performance of the models. We also compared the model performance with several, non-specific, random gene list as well as shuffled labels to ensure specificity of the features. Finally, the candidates were tested for their genetic association in the most recent AMD-GWAS data.
Results :
Most classifiers performed well achieving 70-80% accuracy. Among four machine learning models used, eXtreme Gradient Boosting (xgb) linear yielded the highest accuracy (82%) in differentiating the advanced AMD from controls based on 81 genes/features. The sensitivity and specificity for gene expression predictions to differentiate AMD from controls were 76% and 81% respectively. These features were enriched for genes in immune response, complement and extracellular matrix and connected to known AMD genes through co-expression networks and gene expression correlation. Q-Q plot revealed a greater departure suggesting true associations for the candidate genes.
Conclusions :
Our work demonstrates the merits of machine learning approaches for disease classification and suggests the key role of gene expression changes in AMD despite a small study cohort. Gene regulatory networks are sufficiently interconnected with individual genes having a small impact on the disease outcome. Thus, our method provides an opportunity to regain the holistic view of the AMD that is lost in experimentally tested reductionist approaches.
This abstract was presented at the 2023 ARVO Annual Meeting, held in New Orleans, LA, April 23-27, 2023.