Abstract
Purpose:
Variant filtering remains a bottleneck for the identification of candidate genes in any Next Generation Sequencing (NGS) project. This is even more critical in genetically heterogeneous diseases, such as retinal degenerations. The purpose of this work was to predict the likelihood for a gene to be involved in a dominant disease, considering the distribution of pathogenic and non-pathogenic alleles in the general population.
Methods:
Dominant properties of genes were computed by using information available on public databases of exonic variants. We selected 291 autosomal dominant (AD) and 446 autosomal recessive (AR) genes, which were assessed for specific characteristics, such as length, number and type of variants, conservation. The similarities and differences were scored with parametric and non-parametric tests and Monte Carlo simulations. We evaluated our algorithms by using a list of known genes associated with AD and AR retinitis pigmentosa (RP), as reported in the RetNet database.
Results:
AR genes carried a higher number of non-synonymous (NS) variants compared to AD genes (p=0.0035). This is probably because pathogenic recessive alleles are tolerated in a heterozygous state, while dominant ones are not. This effect was even more significant when the per-gene ratio between NS and synonymous variants was considered (p < 10-10). A predictive tool was constructed based on the distribution of these ratios, and tested on ADRP vs. ARRP. ADRP genes could be predicted with 30% sensitivity and 100% specificity.
Conclusions:
By analyzing specific patterns of variant distribution, we could differentiate AD genes from AR ones. Although our model fails to detect many true positives, it does not provide false negative results, which represent the major obstacle in successful NGS filtering procedures. Analyses of additional features of AD vs. AR genes are currently ongoing, to ascertain whether other elements could improve our final prediction rate.