Abstract
Purpose::
The purpose of this research was to develop a strategy for identifying candidate genes for genetically heterogeneous diseases based upon correlation of gene expression.
Methods::
The Pearson correlation coefficient (r2) was used to identify genes with correlated patterns of expression to known disease-causing genes. Multiple publicly available gene expression data sets for human, mouse and rat were analyzed including an experiment that analyzed the expression using the eyes of 120 rats recently published by our group. Performance was measured by the ability to accurately prioritize the known disease-causing genes for the 11 known Bardet-Biedl Syndrome genes and the 33 known retinitis pigmentosa genes. Specifically, we assessed the enrichment of the top 1% of genes (roughly 300, depending on microarray platform) correlated to list of known disease genes. When assessing the known genes, self-correlations (e.g., NRL to NRL) were excluded, making the results a conservative estimate of our ability to identify correlated genes.
Results::
Of the 11 known BBS genes, 4 were present in the most correlated 1% of all genes based upon an initial analysis (p<1e-4). Similarly, based upon correlation of expression from a single experiment, we found that 11 of the 33 RP genes represented in the experiment were found in the top 1% of all genes (p<1e-6). Of particular interest, several of the other genes that are best correlated to the known RP genes lie in intervals previously linked to autosomal recessive RP and represent excellent candidate RP genes.
Conclusions::
This procedure will be a useful tool in identifying and prioritizing candidate genes, particularly in genetically heterogeneous diseases.
Keywords: gene/expression • gene microarray