In this study, we report on a general method for systematizing and deploying phenotypes using GED as a specific proof of principle. There are several notable observations. More than 73.6% of GEDs are caused by single gene defects, and 68.8% of the GEDs possess at least one ocular feature defined in OMIM. There are one-gene-to-one- disease and one-gene-to-many disease association observed for these ocular diseases. The larger the pleiotropic effects in genetic disorders, the greater the genomic efforts to resolve the complexity. Hierarchical clustering on phenotypes could reveal the underlying structures of the disease based on its features. Hence, clinical findings could be digitized into discrete elements within the feature space much the same way a transcriptome can be digitized by a transcript. Our analysis of identifying subgroups of diseases uncovered several hidden groupings, including those related to central nervous system diseases such as neuropathy and developmental/congenital condition. It is not surprising that Joubert syndromes 2 and 14 (OMIM IDs 608091, 614424) grouped more closely with the central nervous system conditions cerebral dysgenesis, neuropathy, ichthyosis, and palmoplantar keratoderma syndrome (OMIM ID 609528) rather than with Joubert syndromes 17, 18, 2, 3, 6, 7, 9 (OMIM IDs 614615, 614815, 608091, 608629, 610688, 611560, and 612285, respectively), in which systemic features play a more significant role.
The pathway analysis using GeneMANIA module in Cytoscape software identified complex relationships among numerous GED loci. For example, two well-populated nodes involving
COL11A1 and
COL2A1 genes dominate the network for phenotype units ONH3 (glaucoma) and AL11 (myopia;
Fig. 7) that belong to different ocular phenotype groups (optic nerve head for glaucoma and axial length for myopia). In this case, disease units that show significant phenotype-overlap are often clinically defined as a single entity (e.g., Stickler syndrome I and II caused by mutations in
COL2A1 and
COL11A1 genes, respectively). Also, diseases having noticeable overlap in phenotype units are caused by mutations in loci that fall into a single pathway (extracellular matrix organization; MYOC-LTBP2–PLOD1;
Table 3,
Supplementary Table S8). In a different scenario, phenotypes in the disease units are segregating into separate functional networks while being classified in the same phenotype group and developmental/congenital condition as illustrated by the phenotype units DV12 (microphthalmia) and DV29 (hypertelorism;
Fig. 7,
Table 3,
Supplementary Table S8). Interestingly, the top 10 phenotypic units in GEDs are variable in terms of their severity in vision impairment. For example, glaucoma (ONH3) is much more debilitating than ptosis (EL29), although ptosis is a bit more represented than glaucoma as a phenotypic unit in GEDs. In another instance, the number of GEDs having hypertelorism (DV29) as a phenotype is more than GEDs featuring microphthalmia (DV12;
Table 3). These observations are intriguing as microphthalmia is a more severe pathophenotype in the ocular development than hypertelorism. The possible reason could be the arrangement of phenotype descriptions in OMIM where uncompromising, blinding disorders like microphthalmia (DV12) or glaucoma (ONH3) are also presented as phenotype units. On the contrary, hypertelorism (DV29) is a craniofacial abnormality, associated with many ocular and nonocular diseases and syndromes with more systemic than ocular features.
There could be criticisms in considering top 10 phenotype units for cluster analysis as single phenotype unit would mostly contain diseases featuring that specific phenotype unit along with the related ones to form a predictable cluster from the dataset. To counter this, we created clusters using the 0-1 matrix containing all ocular phenotypes (
Fig. 8,
Supplementary Table S6). Similar analyses revealed a biological network from the largest cluster of all ocular phenotype dataset where “acetyl CoA biosynthetic process from pyruvate” and “carbohydrate metabolism” emerged as top biological process and top consolidated pathways-2013, respectively (
Fig. 8).
Abundance of neurologic or craniofacial systemic phenotypes in diseases having the top 10 ocular phenotypes (
Fig. 5C) indicates toward the spatiotemporal proximity of the eye with central nervous system and cranium during development. However, consistent presence of skeletal, cardiovascular, and in some cases gastrointestinal phenotypes in diseases also having top 10 ocular phenotypes (
Fig. 5C) indicate a possible relationship of specific systemic phenotype with the ocular one. Nystagmus, strabismus, and ptosis showed a relatively greater number of associated systemic phenotypes (>500), while microphthalmia, glaucoma, and cataract showed a smaller number of associated systemic phenotypes (<400;
Fig. 5C). This result is indicative of the association of more systemic phenotypes with comparatively less severe blinding phenotypes, as vision impairment intensity is much less in nystagmus, strabismus, and ptosis when compared with glaucoma (ONH3), microphthalmia (DV12), or cataract (LE25). Interestingly, for glaucoma (ONH3) the number of skeletal phenotypes (69) is greater than neurologic phenotypes (50). This is intriguing as glaucoma is a neurodegenerative disease where more neurologic phenotypes are expected. When we looked specifically at the details of diseases having glaucoma as a phenotype, we detected seven such diseases having more than five skeletal features, ultimately giving rise to a greater number of skeletal features associated with glaucoma. Five out of these seven diseases are syndromes associated with multiple phenotypes with skeletal being the major one. The remaining two are diseases where skeletal features are composite with ocular phenotypes (
Supplementary Table S8). There is not much reported evidence to establish connection between skeletal phenotypes and glaucoma. Similarly, the presence of unusual craniofacial characteristics along the HRAS–RAF1–PTPN11 interaction recovers the known understanding of dysregulation of the RAS/mitogen activated protein kinase (MAPK) pathway. Moreover, the specific mutations in PTPN11 have been identified in LEOPARD (multiple lentigines, electrocardiographic conduction abnormalities, ocular hypertelorism, pulmonary stenosis, abnormal genitalia, retardation of growth, and sensorineural deafness) syndrome
11 that suggests that PTPN11 may have more general effects. Phenotype deconstruction thus can be very powerful for testing novel ideas that are mostly unexplored.
Other resources on human disorders are available in public databases such as DisGeNET and HPO.
5,6 The primary source of information for both these databases and our study is OMIM. Our study primarily focused on analyzing the phenotype features for ocular diseases and linking the ocular and systemic annotations per disease, and identifying diseases that share individual ocular and systemic features. Our analysis intends to provide an insight of certain phenotypically similar disorders that share phenotype features such as Noonan syndrome (
Supplementary Table S1). Studying phenotypic similarities is of great importance as it can reveal groups of genes in pathways or biochemical modules in which dysfunction could lead to similar phenotypic consequences.
12
Precise clinical observation defines the strength of phenotype clustering. One of the limitations of this study is the potential incompleteness of the phenotypic data available. The database OMIM describes the majority of human Mendelian syndromes in detail; however, computational analysis of the data contained in OMIM has so far been difficult due to the lack of a controlled vocabulary. A database containing phenotype annotation on all the organ systems and a more sturdy disease catalog would provide more powerful analyses. The association between the disease, gene, and phenotype could be established at the pathophysiological level when the gene expression analyses would be available for different components/tissues in the eye, lack of which prevented us from including them in these analyses. Therefore, we attempted to form categories and minimize feature deconstruction to reduce the inaccurate observations and bias in phenotypes reported in clinics for ocular diseases.
Phenotype deconvolution is an effective approach to explore and catalog relationships among human diseases through underlying features. This approach could be applicable to all diseases related to any organ. Furthermore, relationships among the phenotype, genotype, and feature spaces can introduce layout about biological mechanisms genetically linked to human disease.