Abstract
Purpose:
Exome sequencing experiments are useful in identifying genetic variants that are causative of rare diseases with extreme genetic heterogeneity, such as retinal degenerative diseases. However, the number of plausible disease-causing variations identified in such studies are often overwhelming, even after applying expert-guided filtering steps. Thus, we seek to prioritize lists of candidate variants identified in exome sequencing studies by their likelihood to contribute to a retinal degenerative disease phenotype.
Methods:
We leveraged publicly available data to quantitatively describe all genes in the context of relevant features. Specifically, we selected microarray data of gene expression in 10 tissues of the human eye, RNA-seq data in human retinal tissue and 16 other body tissues, and ChIP-seq data profiling CRX binding sites across the genome. We developed a novel method (Positive and Unlabeled Learning for Prioritization—PULP) to rank candidate disease-causing variants, and trained it on these data.
Results:
Using a Monte Carlo simulation, PULP was shown to perform significantly better than random gene ordering (p < 1 x 10-6). Thirteen published RP linkages, all with identified disease-causing genes, were used to evaluate our system. The causative gene was prioritized at the top of the list 62% of the time, and within the top 5 results 77% of the time. These results were highly significant (p = 6 x 10-6). In addition, we demonstrate that our algorithm outperforms ENDEAVOUR, a current state-of-the-art technique in this field, which had successfully prioritized an RP causative variant in a familial study on Usher’s Syndrome. Our system also successfully prioritized the RP-causing DHDDS variant as the #1 candidate from a list of 20 from a recent exome study, and prioritized a recently identified RP-causing variant in MAK to #4 in a list of 348 candidates.
Conclusions:
The PULP retinal degenerative disease model represents a huge step forward in the identification of retinal degenerative disease variants from exome sequencing studies. Moreover, this technique represents an unbiased, generalizable approach to integrating disease-specific quantitative genetic features for the purpose of disease-associated rankings in candidate lists.
Keywords: 539 genetics •
473 computational modeling •
696 retinal degenerations: hereditary