Abstract
Purpose:
To discover and prioritize human retina disease genes using novel genome wide method and dataset.
Methods:
Both mRNA expression profile and protein subcellular localization offer important clues regarding potential function of genes. To obtain these two sets of information in retina, we performed paired RNA-seq on mouse retina and mass spectrometry (MS) analysis on purified outer segment (OS) respectively. A novel proteogenomic analysis method that integrated paired RNA-seq and MS/MS data was developed, and a Bayesian model was applied to estimates the probability of proteins’ presence. Finally, these data sets are combined with exom sequencing results from patient with retina diseases for candidate disease genes discovery.
Results:
Compared to previous proteomic studies, our method is highly sensitive and specific with three times more proteins (>3000) identified in the mouse retina OS. In addition, by comparing the protein concentration between OS and the rest of the retina, protein subcellular localization in retina can be predicted at 90% accuracy. Strikingly, we found that known retinal disease causing gene is highly enriched with about 60% of them is included in this gene set. A Bayesian statistic model developed based on this dataset is used to predict the relevance of a gene to the retinal diseases across the genome. Excellent prediction result has been obtained as experimental supporting evidences can be found for a significant portion of top ranking genes. Finally, this tool is used to prioritize novel retinal disease gene discovery by combining the exom sequencing results from our retinal disease patient cohort. Validation of the top ranking candidate genes is currently underway.
Conclusions:
Our study was the first to integrate proteomics and transcriptomics to build statistics model for retinal disease gene prediction. Combined with animal model and human patient exome sequencing data, this novel tool can greatly facilitated gene functional studies and novel disease gene discovery.