May 2016
Volume 57, Issue 6
Open Access
Genetics  |   May 2016
Disease-Phenotype Deconvolution in Genetic Eye Diseases Using Online Mendelian Inheritance in Man
Author Affiliations & Notes
  • Priyanka Pandey
    National Institute of Biomedical Genomics Kalyani, West Bengal, India
  • Moulinath Acharya
    National Institute of Biomedical Genomics Kalyani, West Bengal, India
  • Correspondence: Moulinath Acharya, National Institute of Biomedical Genomics, Netaji Subhas Sanatorium 2nd Floor, P.O. N.S.S, Kalyani, West Bengal, India 741251; ma1@nibmg.ac.in
  • Priyanka Pandey, National Institute of Biomedical Genomics, Netaji Subhas Sanatorium 2nd Floor, P.O. N.S.S, Kalyani, West Bengal, India 741251; pp1@nibmg.ac.in
Investigative Ophthalmology & Visual Science May 2016, Vol.57, 2895-2904. doi:10.1167/iovs.15-18057
  • Views
  • PDF
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Priyanka Pandey, Moulinath Acharya; Disease-Phenotype Deconvolution in Genetic Eye Diseases Using Online Mendelian Inheritance in Man. Invest. Ophthalmol. Vis. Sci. 2016;57(6):2895-2904. doi: 10.1167/iovs.15-18057.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

Purpose: Capturing organ-specific phenomes in genetic diseases is an uphill task for the eye as it comprises tissue types derived from all three germinal layers. We attempted to deconstruct genetic eye diseases (GEDs) into primary phenotypic features, to understand the complex genome-phenome relationship in GEDs.

Methods: Using phenotype, molecular basis, and gene description features in OMIM as a primary resource, we analyzed gene-phenotype information. All ocular and systemic phenotypes were categorized and ranked based on occurrence. Clustering was performed on shared ocular features to identify genetic interactions and the largest cluster of each phenotype was used for functional analyses.

Results: We collected 527 GEDs associated with 440 unique protein-coding genes. We indexed 787 ocular and 3094 systemic features, for an average of 2.17 ocular and 8.14 systemic features, respectively, per disease unit. The most common ocular features included nystagmus, hypertelorism, and myopia, while neurological and skeletal are the most common systemic groups associated with GEDs. Functional analyses revealed pathways relevant to GEDs (e.g., extracellular matrix organization in ONH3 [glaucoma]) and protein metabolism in EOM35 (nystagmus) phenotype clusters.

Conclusions: Our work imparts a structure in dissecting GEDs into unique phenotypes to study the relationship between genes and diseases involving the eye.

Clinical signs and symptoms are the process of defining a disease and a large number of abnormal phenotypic features are considered in connection with a single or small subset of disease. This method is convenient for practicing clinicians, although it obliterates ample phenotypic prominence that could be catalogued by insightful and unbiased clinical investigation. With the progress in human genomics research, one of the primary obstacles in clinical research that emerges is the lack of systematic investigation on the interconnectivity of the disease genotype and clinical phenotype. Despite massive improvements of the genomic technologies during past few years, it is still an uphill task to capture the set of all disease-related phenotypes expressed in a specific organ (i.e., the disease phenome). One such attempt has been taken for genetic skin diseases where deconvolving human disease into distinctive features created a third module that interacts with both disease and genotype.1 Briefly, using cutaneous and noncutaneous clinical findings associated with genetic skin diseases from the Online Mendelian Inheritance in Man (OMIM) database, the authors provided a framework for analyzing medical disorders which can aid in the organization and elucidation of biological mechanisms related to human disease.1 Scharfe et al.2 showed a similar approach on human mitochondria and clinical disease phenotypes. In this study, we intend to investigate the concordance of this approach to other genetic diseases. Here we take genetic eye diseases to understand complex relationship between genome and phenome in ocular diseases using a disease deconvolution approach. Since the eye is a unique sensory organ comprising of tissue types from all three germinal layers, it would be an excellent structure to study gene-disease-phenotype interconnectivity. We considered gene, disease, and phenotype as three separate entities that can interact with each other. Considering the disease-phenotype interactions, cluster analysis was done on a specific phenotype unit to give rise to disease clusters. This concept can further be expanded as OMIM contains mostly inherited human diseases, where each disease is connected to a gene. This enables us to create a biological network of genes from the clusters and identify biological processes and pathways associated with a specific disease cluster. A brief outline of our study design is depicted in Figure 1A. 
Figure 1
 
Workflow of the diseases deconstruction, OMIM analysis and filtering in genetic ocular diseases. (A) Three interactive levels are depicted as genes, diseases, and phenotypes. Clinical histories, symptoms, and findings constitute the phenotype (p1, p2, … pn) level while disorders and genetic loci constitute the disease (d1, d2, … dn) and gene (g1, g2, … gn) levels. Pleiotropic genes are designated as multiple edges from a single “g” node to multiple “d” nodes. Phenotype overlaps in diseases are shown between the “d” and “p” levels. Cluster analysis was performed considering individual or all phenotype units in the “p” level. Biological networks were obtained from clusters with corresponding disease and gene information. (B) Database OMIM was searched using “eye” as keyword, which yielded 1308 entries in 6 separate categories enlisted here. Three standard filters were applied to achieve minimum number of entries for PMU and maximum entries matched between PMK and GD. (C) Bar diagram showing the original search result and results obtained using different filters. Among them, “OMIM records linked to gene specific displays of SNPs” resulted in 527 entries equal in PMK and GD and none for PMU. Hence, 527 entries were taken for further analysis.
Figure 1
 
Workflow of the diseases deconstruction, OMIM analysis and filtering in genetic ocular diseases. (A) Three interactive levels are depicted as genes, diseases, and phenotypes. Clinical histories, symptoms, and findings constitute the phenotype (p1, p2, … pn) level while disorders and genetic loci constitute the disease (d1, d2, … dn) and gene (g1, g2, … gn) levels. Pleiotropic genes are designated as multiple edges from a single “g” node to multiple “d” nodes. Phenotype overlaps in diseases are shown between the “d” and “p” levels. Cluster analysis was performed considering individual or all phenotype units in the “p” level. Biological networks were obtained from clusters with corresponding disease and gene information. (B) Database OMIM was searched using “eye” as keyword, which yielded 1308 entries in 6 separate categories enlisted here. Three standard filters were applied to achieve minimum number of entries for PMU and maximum entries matched between PMK and GD. (C) Bar diagram showing the original search result and results obtained using different filters. Among them, “OMIM records linked to gene specific displays of SNPs” resulted in 527 entries equal in PMK and GD and none for PMU. Hence, 527 entries were taken for further analysis.
An argument could be made over choosing OMIM to study disease deconvolution especially in the context of complex human diseases, as OMIM mostly focuses on inherited human diseases and the majority of these are single gene disorders. Conversely, OMIM is the most comprehensive genetic disease database to date. We understand eye diseases enlisted in OMIM would mostly be single-gene diseases with multiple shared phenotypes among them. In this study, we aimed to conduct a comprehensive investigation of phenotype overlaps in Mendelian eye diseases for both ocular and systemic features, which in turn would enable us to identify common biological processes in relatively unrelated ocular and nonocular diseases. The natural tendency of focusing on organ-specific disease management usually ignores signals in terms of phenotypes at other apparently unrelated nonassertive sites in the human body. For instance, a number of nonretinal diseases are associated with diabetes3 and systemic inflammation due to immunoactivation is functionally linked to age-related macular degeneration.4 The phenomic architecture can therefore facilitate in understanding molecular mechanisms related to genetic eye diseases. Therefore, the rationale is: to identify and capture the entire disease phenome for genetic eye diseases, to better understand gene-disease-phenotype interconnectivity, identify global and local disease clusters with shared common ocular phenotypes, determine pathway identification from disease clusters, and identify systemic phenotypes and associated disorders for specific ocular disease. 
Methods
Data Retrieval Using OMIM
We began our search in OMIM using “eye” as a keyword, which resulted in 1308 entries (as of November 2013) under six OMIM categories: known phenotype description and molecular basis; known gene descriptions; known gene and phenotypes; known phenotype description or locus, molecular basis; other, mainly phenotypes with suspected Mendelian basis; and microRNA (Figs. 1B, 1C). We considered three major categories of OMIM for further filtering: phenotype description, molecular basis known (PMK); phenotype description or locus, molecular basis unknown (PMU); and gene description (GD). We used several built-in OMIM filters to obtain a minimum number of PMU entries and a maximum number of entries matched between PMK and GD to minimize entries with unknown molecular bases or loci. Finally, we filtered 527 disease entries that had an equal number of PMK and GD to configure our primary dataset for entire analyses in this study (Figs. 1B, 1C). 
Phenotype Curation and Deconvolution
Each OMIM entry referred to an individual disease. We checked diseases that are reported to be caused by more than one gene mutation (i.e., polygenic diseases and single genes causing multiple diseases). We extracted elemental features, both ocular and systemic from each disease (e.g., ocular findings, neurological findings, cardiovascular findings) to deconstruct the features. 
Using the file transfer protocol site of OMIM, “omim.txt,” “mim2gene.txt,” “morbidmap.txt,” “genemap2.txt,” and “genemap.key” were downloaded. Custom written Perl scripts were used to format and extract information from these files to create our own repository, described in Supplementary Table S1. Information such as ocular phenotype description, mode of inheritance, gene symbols, gene ID, disease name, MIM ID, and description on systemic phenotypes on field “eye” was extracted from the above mentioned tables. Total number of genes causing 527 ocular diseases was obtained (Supplementary Table S1). 
We defined ocular and systemic phenotype groups based on similar features shared among the individual phenotype. The relationship between individual phenotype and number of disorders sharing it was computed. For example, number of disorders caused by each gene and the ocular and systemic phenotypes shared by these diseases per gene was computed. Poorly described MIM IDs for which no phenotype description was available were not considered for further analysis. 
Databases such as DisGeNET5 and Human Phenotype Ontology (HPO6) were explored to obtain more information. Most of these databases contain phenotype annotations on human monogenic disorders including eye diseases. They are mainly consisting of clinical features for each disease and a collection of features based on each phenotype—ocular as well as systemic is specific to our study. Moreover, these databases are mainly based on OMIM, which is the primary information database used in our study as well; therefore, no additional information from these publicly available databases are included at this time in our analysis. 
Each ocular disease was checked for availability of ocular and systemic phenotypes and number of genes causing these diseases was further calculated. Based on this, disorders that contained both ocular and systemic phenotypic descriptions were separated first along with their causal genes. Next, diseases with their causal genes that have only either ocular or systemic phenotypes were computed. Per disease unit, average number of ocular and systemic features was also calculated. 
For each ocular phenotype, the number of diseases sharing it was obtained. Per disease, the number of ocular and systemic phenotypes was collected. Distribution of ocular features among ocular disease units was obtained. Similarly, distributions of ocular features and disease units sharing them for each ocular group were identified. Furthermore, the most common ocular feature represented among the ocular disease units was identified. Individual ocular features were ranked based on the number of ocular disease units shared. Systemic features were grouped into broader classes and ranked. Of the 527 ocular disease units, the number of syndromes was identified. 
Phenotype-Based Cluster Analysis
For functional clustering based on shared phenotypes, we assigned a “1” or “0” for the presence or absence of the specific phenotype (features derived from OMIM), respectively, for 787 independent ocular phenotypes. Thus each of these phenotypes is representing a unique dimension in the multidimensional space. Next, the Raup-Crick index method in PAST data analysis package (University of Oslo, Norway; available in the public domain at http://folk.uio.no/ohammer/past) was used to perform hierarchical clustering of this binary data. From the main matrix, the 10 most represented phenotypic units were filtered and similar cluster dendrogram was created for them. For both unfiltered matrix and top 10 phenotypic units, the clusters above 80% similarity index were selected for further analyses. 
Network Analysis
The genes present in the largest cluster above 80% similarity index for top 10 phenoytypic units and the unfiltered “all ocular phenotypes” were further considered for network analyses using GeneMANIA7 module in the Cytoscape.8 For all the analyses, Gene Ontology-Biological Processes was set as the default. Also, the default preset in GeneMANIA was used, which essentially allows 20 related genes of the query genes that are coming from the largest cluster for each phenotypic unit to form a network. 
Results
Genetic eye diseases (GEDs) from OMIM were deconvolved into ocular and systemic features as shown in Figures 1B, 1C, and Supplementary Table S1. We excluded 7 out of 527 diseases that were polygenic from our study. The remaining 520 diseases were taken for our analyses and designated as GEDs. A total of 440 genes were found to be linked with these 520 GEDs. Out of these, 383 entries were found for one gene causing one disease. Similarly, 39, 15, and 2 entries were observed for one gene causing 2, 3, and 4 diseases, respectively. A single gene called paired box protein-6 (PAX6) was detected to be linked with a maximum number of six distinct but related GEDs (Figs. 2A, 2B). The mode of inheritance of GEDs is found to be variable, with a majority of them being autosomal (∼75%). Autosomal recessive (43.69%) is the most represented mode of inheritance followed by autosomal dominant (32.02%). Inheritance that was x-linked was found in ∼9% GEDs, of which 3.2% are dominant and 3.58% are recessive. Only a minority are sporadic (1.88%), multifactorial (0.38%), somatic (0.38%), or mitochondrial (0.38%). Mode of inheritance is unknown in almost 12% of GEDs (Fig. 2C). 
Figure 2
 
Gene distribution and mode of inheritance in genetic ocular diseases. (A) Mutations in individual protein-coding loci can cause multiple unique genetic eye diseases. Seven entries were further filtered from 527 entries as they are polygenic; 440 genes/loci have been found to be associated with 520 unique eye diseases. A total of 383 entries had one gene involved in one disease. The remaining 57 genes are pleiotropic in nature. In only one case, the maximum number of diseases (n = 6) is connected with a single gene. (B) Pie chart showing the distribution of genes and diseases involving the eye. (C) The mode of inheritance in GEDs are variable, with the greatest number falling under autosomal recessive mode of inheritance (43.69%), followed by autosomal dominant (32.02%), X-linked (9.23%), and others. Mode of inheritance was unknown for 12.05% diseases.
Figure 2
 
Gene distribution and mode of inheritance in genetic ocular diseases. (A) Mutations in individual protein-coding loci can cause multiple unique genetic eye diseases. Seven entries were further filtered from 527 entries as they are polygenic; 440 genes/loci have been found to be associated with 520 unique eye diseases. A total of 383 entries had one gene involved in one disease. The remaining 57 genes are pleiotropic in nature. In only one case, the maximum number of diseases (n = 6) is connected with a single gene. (B) Pie chart showing the distribution of genes and diseases involving the eye. (C) The mode of inheritance in GEDs are variable, with the greatest number falling under autosomal recessive mode of inheritance (43.69%), followed by autosomal dominant (32.02%), X-linked (9.23%), and others. Mode of inheritance was unknown for 12.05% diseases.
Of the 520 disease units, there were a total of 787 ocular features for 363 disease units and 3094 systemic features for 380 disease units (Fig. 3A; Supplementary Table S2), with an average of 2.17 ocular and 8.14 systemic features, respectively, per disease unit. A total of 314 and 324 genes are reported to be mutated in OMIM to cause the 363 and 380 GEDs, respectively. Finally, 281 GEDs (common between 363 and 380 disease units) are found to be caused by 251 genes (common between 314 and 324 genes; Fig. 3A, Supplementary Table S3). Our analysis indicated that of these 520 unique eye diseases defined in the space for all GEDs, only 82 are “pure geno-ocular diseases,” exhibiting ocular phenotypes without systemic manifestations. This is not surprising, as few genes are likely to function only in an eye-restricted manner. Similarly, 99 disease units contain only systemic features and 62 out of 520 diseases are without having any defined ocular or systemic phenotype in the OMIM (omim.txt version available in November, 2013; Fig. 3A, Supplementary Table S1). We randomly checked those diseases for possible ocular connections in the OMIM description and found isolated and nonreplicative case reports suggesting the presence of one or more eye-related pathophenotypes associated with them. The distributions of ocular and systemic features per disease unit are outlined in Figure 3B. 
Figure 3
 
Distribution of unique ocular and systemic phenotype units in genetic ocular diseases. (A) Altogether, 787 ocular and 3094 systemic phenotypes were derived form OMIM for 520 diseases. Further, 363 MIM disease IDs were obtained for 787 unique ocular phenotypes, finally connected to 314 gene IDs. Similarly, 380 MIM disease IDs were obtained for 3094 systemic phenotypes connected to 324 gene IDs. A Venn diagram identifies 251 genes involved in 281 diseases having both ocular and systemic phenotypic features, which was taken for further analysis. (B) Ocular and systemic features are shown in descending order in terms of absolute number of 281 discrete disease units having both ocular and systemic phenotypes.
Figure 3
 
Distribution of unique ocular and systemic phenotype units in genetic ocular diseases. (A) Altogether, 787 ocular and 3094 systemic phenotypes were derived form OMIM for 520 diseases. Further, 363 MIM disease IDs were obtained for 787 unique ocular phenotypes, finally connected to 314 gene IDs. Similarly, 380 MIM disease IDs were obtained for 3094 systemic phenotypes connected to 324 gene IDs. A Venn diagram identifies 251 genes involved in 281 diseases having both ocular and systemic phenotypic features, which was taken for further analysis. (B) Ocular and systemic features are shown in descending order in terms of absolute number of 281 discrete disease units having both ocular and systemic phenotypes.
Furthermore, we categorized GEDs based on their ocular and systemic features into broad groups. The ocular phenotype groups are mostly based on anatomically defined ocular compartments. Some exceptions such as developmental/congenital condition, axial length, or malignancy are present, which actually define disease condition rather than ocular structures. Similarly, the systemic phenotype groups comprises mostly of organ systems. Altogether, 20 and 21 groups have been created to accommodate all ocular and systemic features, respectively (Table 1; Supplementary Table S4). Prevalence of ocular and systemic features per ocular phenotypic group and systemic phenotypic group is depicted in Figures 4A and 4B (see also Table 1, Supplementary Table S4). Figure 4A shows a ranking of systemic features associated with the 380 GEDs. In aggregate, neurological phenotypes represent the most common class of systemic features. This is mostly because a major portion of the eye is originated from neural tube and neural crest layers. Consistent with this finding, retina tops the list of ocular disease groups with number of diseases and phenotypes. Also interestingly, the trend in the number of diseases and phenotypes for each group is almost opposite for systemic and ocular groups. A little more than half (52%) of the systemic groups have a greater number of phenotypes than diseases (“neurologic” to “immunology” in Fig. 4A) while 70% of the ocular groups have a greater more number of diseases than phenotypes (“extraocular muscle” to “malignant” in Fig. 4B). 
Table 1
 
Ocular and Systemic Phenotype Groups in GEDs
Table 1
 
Ocular and Systemic Phenotype Groups in GEDs
Figure 4
 
Distribution of disease and phenotype units in systemic and ocular phenotype groups. The phenotypic group is shown on the x-axis, and the absolute number of disease units and phenotypes exhibiting each group is shown on the y-axis. (A) Systemic phenotype groups. (B) Ocular phenotype groups.
Figure 4
 
Distribution of disease and phenotype units in systemic and ocular phenotype groups. The phenotypic group is shown on the x-axis, and the absolute number of disease units and phenotypes exhibiting each group is shown on the y-axis. (A) Systemic phenotype groups. (B) Ocular phenotype groups.
We next focused on the ocular features associated with the GEDs. Distribution based on the number of ocular phenotypes present in GEDs was examined and most disease units (∼44%) were found to have only one or two ocular phenotypes. The remaining diseases were found to have multiple distinct ocular features (Fig. 5A). For instance, retinal cone dystrophy (OMIM 610356) has features of photophobia, myopia, astigmatism, strabismus, nystagmus, nyctalopia, macular atrophy, macular dysfunction, central scotoma on photopic and dark-adapted perimetry testing, and peripheral sensitivity loss. Although all these findings may be explained as direct or indirect consequences of gene KCNV2 mutations, we separately annotated these features to delineate the phenotypic relationship between retinal cone dystrophy and other conditions with nystagmus or macular dysfunction (Supplementary Table S1). 
Figure 5
 
Disease-phenotype distribution and ranking of phenotype units in ocular diseases with distribution of systemic phenotype groups among top-ranked ocular phenotype units. (A) Distribution of ocular features among discrete eye disease units. Most genetic eye diseases have one or two dominant ocular features; however, there are many diseases with multiple distinct ocular features. Number of ocular features is shown along the x-axis. (B) Ranking of ocular phenotypic units. The ocular phenotypic units are shown in descending order in terms of their presence in ocular diseases. (C) Stacked columns of top 10 ocular phenotypes show number of systemic phenotypes associated and distribution of all systemic phenotype groups among them. Neurologic, skeletal, and craniofacial groups are boxed with solid lines.
Figure 5
 
Disease-phenotype distribution and ranking of phenotype units in ocular diseases with distribution of systemic phenotype groups among top-ranked ocular phenotype units. (A) Distribution of ocular features among discrete eye disease units. Most genetic eye diseases have one or two dominant ocular features; however, there are many diseases with multiple distinct ocular features. Number of ocular features is shown along the x-axis. (B) Ranking of ocular phenotypic units. The ocular phenotypic units are shown in descending order in terms of their presence in ocular diseases. (C) Stacked columns of top 10 ocular phenotypes show number of systemic phenotypes associated and distribution of all systemic phenotype groups among them. Neurologic, skeletal, and craniofacial groups are boxed with solid lines.
To uncover the hidden phenotypic structure within the disease space, we carried out an analysis that aimed at identifying subgroups of diseases sharing similar phenotypic features on a global level (Fig. 1A). The analysis is performed to focus on the most common ocular phenotypes in GEDs (Fig. 5B, Supplementary Table S5). Nystagmus (EOM35), associated with 64 GEDs, is the most prevalent phenotype followed by hypertelorism (DV29), myopia (AL11), strabismus (EOM64), and microphthalmia (DV12; Table 2, Supplementary Table S5). A number of systemic phenotypes were found to be associated with GEDs featuring the 10 ocular phenotypes. For example, the 64 disease units that are found to have EOM35 as a phenotype also contain a total of 643 unique systemic phenotype units where the neurologic systemic phenotype group (289) is predominant (Table 2). In fact, among all systemic phenotype groups, neurologic (NEU) is the most prevalent one in the top 10 ocular phenotype units, followed by skeletal (SKL), craniofacial (HNC), and gastrointestinal (GIA). A detailed distribution of systemic phenotype groups across the top 10 ocular phenotypes is outlined in Figure 5C. 
Table 2
 
Ocular Phenotypes Mostly Represented in Ocular Diseases
Table 2
 
Ocular Phenotypes Mostly Represented in Ocular Diseases
To identify genetic interactions in ocular diseases sharing common ocular phenotypes, the top 10 ocular findings as depicted in Table 2 were further used for cluster analysis (Fig. 6). A 1 or 0 was assigned for the presence or absence of the specific ocular features for a particular GED unit (Supplementary Table S6). Thus in the final analysis, the subspace defined by a particular ocular phenotype was represented as a matrix of 1's and 0's for number of GED units of 787 ocular phenotypes found (Supplementary Table S7). Figure 6 showed the similarity indexes based on feature clustering using Raup–Crick metrics for the presence or absence of data (http://folk.uio.no/ohammer/past). 
Figure 6
 
Dendrograms of ocular phenotype units mostly represented in ocular diseases. Cluster analysis of top 10 ocular phenotypes. Similar analysis was done on all ocular phenotypes combined. Clusters above a score of 0.8 were considered for GeneMania analyses of biological networks.
Figure 6
 
Dendrograms of ocular phenotype units mostly represented in ocular diseases. Cluster analysis of top 10 ocular phenotypes. Similar analysis was done on all ocular phenotypes combined. Clusters above a score of 0.8 were considered for GeneMania analyses of biological networks.
To identify meaningful biological processes and pathways among GED loci sharing common phenotypes, genes associated with the largest disease cluster from each dendrogram of the 10 phenotypic units were subjected to network analysis using GeneMANIA (Supplementary Table S8). More than 90% of the genes derived from individual disease clusters of each phenotypic unit form a network (Fig. 7). In all clustered gene-set analyses, physical interaction was on top followed by coexpression among the percent weight parameters of the edges of all networks. For all networks, we considered two attributes in GeneMANIA viz.: biological processes (based on Gene Ontology: GO) and consolidated pathways-2013. The analysis is summarized in Table 3. For all gene networks, the top biological process matched with the top consolidated pathways-2013, suggesting consistency in functional nature of the network. For example, the importance of extracellular matrix structure and function is well known in glaucoma9,10; our study, which designated these elements as ONH3, showed similar results (Table 3). In most cases, a group of similar functions emerge as major attributes involving a few genes present within a cluster. For instance, four out of eight query genes (COL2A1, COL11A1, PLOD1, and NF1) in ONH3-cluster 4 are included in the extracellular structure organization (Fig. 7; Table 3). However, for EL8-cluster 3, genes present within the cluster showed equally important biological processes (cholesterol biosynthetic process, q value = 1.38 × 10−5 and histone methyltransferase complex, q value = 1.47 × 10−4) albeit the top consolidated pathway-2013 for EL8-cluster 3 is cholesterol biosynthesis. 
Figure 7
 
A few representative of biological network of clusters obtained from ocular phenotype units. Biological networks were derived for clusters using GeneMANIA plugin in Cytoscape. Genes corresponding to the largest cluster for each phenotype unit serve as a query while the default setting in GeneMANIA allowed 20 related genes to form the network along with the query genes. The downward arrowshaped nodes represent genes involved in the top biological process and top consolidated pathways-2013 for each network. Triangle-shaped nodes in DV29-cluster 3 are representatives of visual pathways that are present but not shown in other networks. Color codes for nodes and edges are mentioned in the figure. Thickness of an edge in a network depends on the weightage score in GeneMANIA, which indicates stronger evidential support for that specific type of edge between two nodes. For example, a thick pink edge between PLOD3 and PLOD1 in ONH3-cluster 4 indicates strong evidence of physical interaction between them.
Figure 7
 
A few representative of biological network of clusters obtained from ocular phenotype units. Biological networks were derived for clusters using GeneMANIA plugin in Cytoscape. Genes corresponding to the largest cluster for each phenotype unit serve as a query while the default setting in GeneMANIA allowed 20 related genes to form the network along with the query genes. The downward arrowshaped nodes represent genes involved in the top biological process and top consolidated pathways-2013 for each network. Triangle-shaped nodes in DV29-cluster 3 are representatives of visual pathways that are present but not shown in other networks. Color codes for nodes and edges are mentioned in the figure. Thickness of an edge in a network depends on the weightage score in GeneMANIA, which indicates stronger evidential support for that specific type of edge between two nodes. For example, a thick pink edge between PLOD3 and PLOD1 in ONH3-cluster 4 indicates strong evidence of physical interaction between them.
Table 3
 
Biological Network Analyses of Largest Cluster From 10 Phenotype Units
Table 3
 
Biological Network Analyses of Largest Cluster From 10 Phenotype Units
Discussion
In this study, we report on a general method for systematizing and deploying phenotypes using GED as a specific proof of principle. There are several notable observations. More than 73.6% of GEDs are caused by single gene defects, and 68.8% of the GEDs possess at least one ocular feature defined in OMIM. There are one-gene-to-one- disease and one-gene-to-many disease association observed for these ocular diseases. The larger the pleiotropic effects in genetic disorders, the greater the genomic efforts to resolve the complexity. Hierarchical clustering on phenotypes could reveal the underlying structures of the disease based on its features. Hence, clinical findings could be digitized into discrete elements within the feature space much the same way a transcriptome can be digitized by a transcript. Our analysis of identifying subgroups of diseases uncovered several hidden groupings, including those related to central nervous system diseases such as neuropathy and developmental/congenital condition. It is not surprising that Joubert syndromes 2 and 14 (OMIM IDs 608091, 614424) grouped more closely with the central nervous system conditions cerebral dysgenesis, neuropathy, ichthyosis, and palmoplantar keratoderma syndrome (OMIM ID 609528) rather than with Joubert syndromes 17, 18, 2, 3, 6, 7, 9 (OMIM IDs 614615, 614815, 608091, 608629, 610688, 611560, and 612285, respectively), in which systemic features play a more significant role. 
The pathway analysis using GeneMANIA module in Cytoscape software identified complex relationships among numerous GED loci. For example, two well-populated nodes involving COL11A1 and COL2A1 genes dominate the network for phenotype units ONH3 (glaucoma) and AL11 (myopia; Fig. 7) that belong to different ocular phenotype groups (optic nerve head for glaucoma and axial length for myopia). In this case, disease units that show significant phenotype-overlap are often clinically defined as a single entity (e.g., Stickler syndrome I and II caused by mutations in COL2A1 and COL11A1 genes, respectively). Also, diseases having noticeable overlap in phenotype units are caused by mutations in loci that fall into a single pathway (extracellular matrix organization; MYOC-LTBP2–PLOD1; Table 3, Supplementary Table S8). In a different scenario, phenotypes in the disease units are segregating into separate functional networks while being classified in the same phenotype group and developmental/congenital condition as illustrated by the phenotype units DV12 (microphthalmia) and DV29 (hypertelorism; Fig. 7, Table 3, Supplementary Table S8). Interestingly, the top 10 phenotypic units in GEDs are variable in terms of their severity in vision impairment. For example, glaucoma (ONH3) is much more debilitating than ptosis (EL29), although ptosis is a bit more represented than glaucoma as a phenotypic unit in GEDs. In another instance, the number of GEDs having hypertelorism (DV29) as a phenotype is more than GEDs featuring microphthalmia (DV12; Table 3). These observations are intriguing as microphthalmia is a more severe pathophenotype in the ocular development than hypertelorism. The possible reason could be the arrangement of phenotype descriptions in OMIM where uncompromising, blinding disorders like microphthalmia (DV12) or glaucoma (ONH3) are also presented as phenotype units. On the contrary, hypertelorism (DV29) is a craniofacial abnormality, associated with many ocular and nonocular diseases and syndromes with more systemic than ocular features. 
There could be criticisms in considering top 10 phenotype units for cluster analysis as single phenotype unit would mostly contain diseases featuring that specific phenotype unit along with the related ones to form a predictable cluster from the dataset. To counter this, we created clusters using the 0-1 matrix containing all ocular phenotypes (Fig. 8, Supplementary Table S6). Similar analyses revealed a biological network from the largest cluster of all ocular phenotype dataset where “acetyl CoA biosynthetic process from pyruvate” and “carbohydrate metabolism” emerged as top biological process and top consolidated pathways-2013, respectively (Fig. 8). 
Figure 8
 
Functional analysis of largest cluster from all ocular phenotypes combined. Dendrogram of all ocular phenotypes combined. At 80% similarity index, the largest cluster (cluster 29) of all ocular phenotypes forms biological network where “acetyl-CoA biosynthetic process from pyruvate” appears as top biological process (GO) and “metabolism” as top consolidated pathways-2013. Network legends are depicted in the top right corner. The downward arrow query genes are involved in the top biological process, while the triangle-shaped query genes are included in visual perception (q value = 1.01 × 10−3).
Figure 8
 
Functional analysis of largest cluster from all ocular phenotypes combined. Dendrogram of all ocular phenotypes combined. At 80% similarity index, the largest cluster (cluster 29) of all ocular phenotypes forms biological network where “acetyl-CoA biosynthetic process from pyruvate” appears as top biological process (GO) and “metabolism” as top consolidated pathways-2013. Network legends are depicted in the top right corner. The downward arrow query genes are involved in the top biological process, while the triangle-shaped query genes are included in visual perception (q value = 1.01 × 10−3).
Abundance of neurologic or craniofacial systemic phenotypes in diseases having the top 10 ocular phenotypes (Fig. 5C) indicates toward the spatiotemporal proximity of the eye with central nervous system and cranium during development. However, consistent presence of skeletal, cardiovascular, and in some cases gastrointestinal phenotypes in diseases also having top 10 ocular phenotypes (Fig. 5C) indicate a possible relationship of specific systemic phenotype with the ocular one. Nystagmus, strabismus, and ptosis showed a relatively greater number of associated systemic phenotypes (>500), while microphthalmia, glaucoma, and cataract showed a smaller number of associated systemic phenotypes (<400; Fig. 5C). This result is indicative of the association of more systemic phenotypes with comparatively less severe blinding phenotypes, as vision impairment intensity is much less in nystagmus, strabismus, and ptosis when compared with glaucoma (ONH3), microphthalmia (DV12), or cataract (LE25). Interestingly, for glaucoma (ONH3) the number of skeletal phenotypes (69) is greater than neurologic phenotypes (50). This is intriguing as glaucoma is a neurodegenerative disease where more neurologic phenotypes are expected. When we looked specifically at the details of diseases having glaucoma as a phenotype, we detected seven such diseases having more than five skeletal features, ultimately giving rise to a greater number of skeletal features associated with glaucoma. Five out of these seven diseases are syndromes associated with multiple phenotypes with skeletal being the major one. The remaining two are diseases where skeletal features are composite with ocular phenotypes (Supplementary Table S8). There is not much reported evidence to establish connection between skeletal phenotypes and glaucoma. Similarly, the presence of unusual craniofacial characteristics along the HRAS–RAF1–PTPN11 interaction recovers the known understanding of dysregulation of the RAS/mitogen activated protein kinase (MAPK) pathway. Moreover, the specific mutations in PTPN11 have been identified in LEOPARD (multiple lentigines, electrocardiographic conduction abnormalities, ocular hypertelorism, pulmonary stenosis, abnormal genitalia, retardation of growth, and sensorineural deafness) syndrome11 that suggests that PTPN11 may have more general effects. Phenotype deconstruction thus can be very powerful for testing novel ideas that are mostly unexplored. 
Other resources on human disorders are available in public databases such as DisGeNET and HPO.5,6 The primary source of information for both these databases and our study is OMIM. Our study primarily focused on analyzing the phenotype features for ocular diseases and linking the ocular and systemic annotations per disease, and identifying diseases that share individual ocular and systemic features. Our analysis intends to provide an insight of certain phenotypically similar disorders that share phenotype features such as Noonan syndrome (Supplementary Table S1). Studying phenotypic similarities is of great importance as it can reveal groups of genes in pathways or biochemical modules in which dysfunction could lead to similar phenotypic consequences.12 
Precise clinical observation defines the strength of phenotype clustering. One of the limitations of this study is the potential incompleteness of the phenotypic data available. The database OMIM describes the majority of human Mendelian syndromes in detail; however, computational analysis of the data contained in OMIM has so far been difficult due to the lack of a controlled vocabulary. A database containing phenotype annotation on all the organ systems and a more sturdy disease catalog would provide more powerful analyses. The association between the disease, gene, and phenotype could be established at the pathophysiological level when the gene expression analyses would be available for different components/tissues in the eye, lack of which prevented us from including them in these analyses. Therefore, we attempted to form categories and minimize feature deconstruction to reduce the inaccurate observations and bias in phenotypes reported in clinics for ocular diseases. 
Phenotype deconvolution is an effective approach to explore and catalog relationships among human diseases through underlying features. This approach could be applicable to all diseases related to any organ. Furthermore, relationships among the phenotype, genotype, and feature spaces can introduce layout about biological mechanisms genetically linked to human disease. 
Acknowledgments
The authors would like to thank Mahua Maulik, PhD, for scientific discussion and critical reading of the manuscript. Supported by National Institute of Biomedical Genomics Intramural funds (PP, MA). The authors alone are responsible for the content and writing of the paper. 
Disclosure: P. Pandey, None; M. Acharya, None 
References
Feramisco JD, Sadreyev RI, Murray ML, Grishin NV, Tsao H. Phenotypic and genotypic analyses of genetic skin disease through the Online Mendelian Inheritance in Man (OMIM) database. J Invest Dermatol. 2009; 129: 2628–2636.
Scharfe C, Lu HH, Neuenburg JK, et al. Mapping gene associations in human mitochondria using clinical disease phenotypes. PLoS Comput Biol. 2009; 5: e1000374.
Jeganathan VS, Wang JJ, Wong TY. Ocular associations of diabetes other than diabetic retinopathy. Diabetes Care. 2008; 31: 1905–1912.
Hollyfield JG, Bonilha VL, Rayborn ME, et al. Oxidative damage-induced inflammation initiates age-related macular degeneration. Nat Med. 2008; 14: 194–198.
Piñero J, Queralt-Rosinach N, Bravo A, et al. DisGeNET: a discovery platform for the dynamical exploration of human diseases and their genes. Database (Oxford). 2015; 2015:bav028.
Köhler S, Doelken SC, Mungall CJ, et al. The Human Phenotype Ontology project: linking molecular biology and disease through phenotype data. Nucleic Acids Res. 2014; 42: D966–D974.
Warde-Farley D, Donaldson SL, Comes O, et al. The GeneMANIA prediction server: biological network integration for gene prioritization and predicting gene function. Nucleic Acids Res. 2010; 38: W214–W220.
Shannon P, Markiel A, Ozier O, et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 2003; 13: 2498–2504.
Hernandez MR, Ye H. Glaucoma: changes in extracellular matrix in the optic nerve head. Ann Med. 1993; 25: 309–315.
Acott TS, Kelley MJ. Extracellular matrix in the trabecular meshwork. Exp Eye Res. 2008; 86: 543–561.
Aoki Y, Niihori T, Narumi Y, Kure S, Matsubara Y. The RAS/MAPK syndromes: novel roles of the RAS pathway in human genetic disorders. Hum Mutat. 2008; 29: 992–1006.
Goh KI, Cusick ME, Valle D, et al. The human disease network. Proc Natl Acad Sci U S A. 2007; 104: 8685–8690.
Figure 1
 
Workflow of the diseases deconstruction, OMIM analysis and filtering in genetic ocular diseases. (A) Three interactive levels are depicted as genes, diseases, and phenotypes. Clinical histories, symptoms, and findings constitute the phenotype (p1, p2, … pn) level while disorders and genetic loci constitute the disease (d1, d2, … dn) and gene (g1, g2, … gn) levels. Pleiotropic genes are designated as multiple edges from a single “g” node to multiple “d” nodes. Phenotype overlaps in diseases are shown between the “d” and “p” levels. Cluster analysis was performed considering individual or all phenotype units in the “p” level. Biological networks were obtained from clusters with corresponding disease and gene information. (B) Database OMIM was searched using “eye” as keyword, which yielded 1308 entries in 6 separate categories enlisted here. Three standard filters were applied to achieve minimum number of entries for PMU and maximum entries matched between PMK and GD. (C) Bar diagram showing the original search result and results obtained using different filters. Among them, “OMIM records linked to gene specific displays of SNPs” resulted in 527 entries equal in PMK and GD and none for PMU. Hence, 527 entries were taken for further analysis.
Figure 1
 
Workflow of the diseases deconstruction, OMIM analysis and filtering in genetic ocular diseases. (A) Three interactive levels are depicted as genes, diseases, and phenotypes. Clinical histories, symptoms, and findings constitute the phenotype (p1, p2, … pn) level while disorders and genetic loci constitute the disease (d1, d2, … dn) and gene (g1, g2, … gn) levels. Pleiotropic genes are designated as multiple edges from a single “g” node to multiple “d” nodes. Phenotype overlaps in diseases are shown between the “d” and “p” levels. Cluster analysis was performed considering individual or all phenotype units in the “p” level. Biological networks were obtained from clusters with corresponding disease and gene information. (B) Database OMIM was searched using “eye” as keyword, which yielded 1308 entries in 6 separate categories enlisted here. Three standard filters were applied to achieve minimum number of entries for PMU and maximum entries matched between PMK and GD. (C) Bar diagram showing the original search result and results obtained using different filters. Among them, “OMIM records linked to gene specific displays of SNPs” resulted in 527 entries equal in PMK and GD and none for PMU. Hence, 527 entries were taken for further analysis.
Figure 2
 
Gene distribution and mode of inheritance in genetic ocular diseases. (A) Mutations in individual protein-coding loci can cause multiple unique genetic eye diseases. Seven entries were further filtered from 527 entries as they are polygenic; 440 genes/loci have been found to be associated with 520 unique eye diseases. A total of 383 entries had one gene involved in one disease. The remaining 57 genes are pleiotropic in nature. In only one case, the maximum number of diseases (n = 6) is connected with a single gene. (B) Pie chart showing the distribution of genes and diseases involving the eye. (C) The mode of inheritance in GEDs are variable, with the greatest number falling under autosomal recessive mode of inheritance (43.69%), followed by autosomal dominant (32.02%), X-linked (9.23%), and others. Mode of inheritance was unknown for 12.05% diseases.
Figure 2
 
Gene distribution and mode of inheritance in genetic ocular diseases. (A) Mutations in individual protein-coding loci can cause multiple unique genetic eye diseases. Seven entries were further filtered from 527 entries as they are polygenic; 440 genes/loci have been found to be associated with 520 unique eye diseases. A total of 383 entries had one gene involved in one disease. The remaining 57 genes are pleiotropic in nature. In only one case, the maximum number of diseases (n = 6) is connected with a single gene. (B) Pie chart showing the distribution of genes and diseases involving the eye. (C) The mode of inheritance in GEDs are variable, with the greatest number falling under autosomal recessive mode of inheritance (43.69%), followed by autosomal dominant (32.02%), X-linked (9.23%), and others. Mode of inheritance was unknown for 12.05% diseases.
Figure 3
 
Distribution of unique ocular and systemic phenotype units in genetic ocular diseases. (A) Altogether, 787 ocular and 3094 systemic phenotypes were derived form OMIM for 520 diseases. Further, 363 MIM disease IDs were obtained for 787 unique ocular phenotypes, finally connected to 314 gene IDs. Similarly, 380 MIM disease IDs were obtained for 3094 systemic phenotypes connected to 324 gene IDs. A Venn diagram identifies 251 genes involved in 281 diseases having both ocular and systemic phenotypic features, which was taken for further analysis. (B) Ocular and systemic features are shown in descending order in terms of absolute number of 281 discrete disease units having both ocular and systemic phenotypes.
Figure 3
 
Distribution of unique ocular and systemic phenotype units in genetic ocular diseases. (A) Altogether, 787 ocular and 3094 systemic phenotypes were derived form OMIM for 520 diseases. Further, 363 MIM disease IDs were obtained for 787 unique ocular phenotypes, finally connected to 314 gene IDs. Similarly, 380 MIM disease IDs were obtained for 3094 systemic phenotypes connected to 324 gene IDs. A Venn diagram identifies 251 genes involved in 281 diseases having both ocular and systemic phenotypic features, which was taken for further analysis. (B) Ocular and systemic features are shown in descending order in terms of absolute number of 281 discrete disease units having both ocular and systemic phenotypes.
Figure 4
 
Distribution of disease and phenotype units in systemic and ocular phenotype groups. The phenotypic group is shown on the x-axis, and the absolute number of disease units and phenotypes exhibiting each group is shown on the y-axis. (A) Systemic phenotype groups. (B) Ocular phenotype groups.
Figure 4
 
Distribution of disease and phenotype units in systemic and ocular phenotype groups. The phenotypic group is shown on the x-axis, and the absolute number of disease units and phenotypes exhibiting each group is shown on the y-axis. (A) Systemic phenotype groups. (B) Ocular phenotype groups.
Figure 5
 
Disease-phenotype distribution and ranking of phenotype units in ocular diseases with distribution of systemic phenotype groups among top-ranked ocular phenotype units. (A) Distribution of ocular features among discrete eye disease units. Most genetic eye diseases have one or two dominant ocular features; however, there are many diseases with multiple distinct ocular features. Number of ocular features is shown along the x-axis. (B) Ranking of ocular phenotypic units. The ocular phenotypic units are shown in descending order in terms of their presence in ocular diseases. (C) Stacked columns of top 10 ocular phenotypes show number of systemic phenotypes associated and distribution of all systemic phenotype groups among them. Neurologic, skeletal, and craniofacial groups are boxed with solid lines.
Figure 5
 
Disease-phenotype distribution and ranking of phenotype units in ocular diseases with distribution of systemic phenotype groups among top-ranked ocular phenotype units. (A) Distribution of ocular features among discrete eye disease units. Most genetic eye diseases have one or two dominant ocular features; however, there are many diseases with multiple distinct ocular features. Number of ocular features is shown along the x-axis. (B) Ranking of ocular phenotypic units. The ocular phenotypic units are shown in descending order in terms of their presence in ocular diseases. (C) Stacked columns of top 10 ocular phenotypes show number of systemic phenotypes associated and distribution of all systemic phenotype groups among them. Neurologic, skeletal, and craniofacial groups are boxed with solid lines.
Figure 6
 
Dendrograms of ocular phenotype units mostly represented in ocular diseases. Cluster analysis of top 10 ocular phenotypes. Similar analysis was done on all ocular phenotypes combined. Clusters above a score of 0.8 were considered for GeneMania analyses of biological networks.
Figure 6
 
Dendrograms of ocular phenotype units mostly represented in ocular diseases. Cluster analysis of top 10 ocular phenotypes. Similar analysis was done on all ocular phenotypes combined. Clusters above a score of 0.8 were considered for GeneMania analyses of biological networks.
Figure 7
 
A few representative of biological network of clusters obtained from ocular phenotype units. Biological networks were derived for clusters using GeneMANIA plugin in Cytoscape. Genes corresponding to the largest cluster for each phenotype unit serve as a query while the default setting in GeneMANIA allowed 20 related genes to form the network along with the query genes. The downward arrowshaped nodes represent genes involved in the top biological process and top consolidated pathways-2013 for each network. Triangle-shaped nodes in DV29-cluster 3 are representatives of visual pathways that are present but not shown in other networks. Color codes for nodes and edges are mentioned in the figure. Thickness of an edge in a network depends on the weightage score in GeneMANIA, which indicates stronger evidential support for that specific type of edge between two nodes. For example, a thick pink edge between PLOD3 and PLOD1 in ONH3-cluster 4 indicates strong evidence of physical interaction between them.
Figure 7
 
A few representative of biological network of clusters obtained from ocular phenotype units. Biological networks were derived for clusters using GeneMANIA plugin in Cytoscape. Genes corresponding to the largest cluster for each phenotype unit serve as a query while the default setting in GeneMANIA allowed 20 related genes to form the network along with the query genes. The downward arrowshaped nodes represent genes involved in the top biological process and top consolidated pathways-2013 for each network. Triangle-shaped nodes in DV29-cluster 3 are representatives of visual pathways that are present but not shown in other networks. Color codes for nodes and edges are mentioned in the figure. Thickness of an edge in a network depends on the weightage score in GeneMANIA, which indicates stronger evidential support for that specific type of edge between two nodes. For example, a thick pink edge between PLOD3 and PLOD1 in ONH3-cluster 4 indicates strong evidence of physical interaction between them.
Figure 8
 
Functional analysis of largest cluster from all ocular phenotypes combined. Dendrogram of all ocular phenotypes combined. At 80% similarity index, the largest cluster (cluster 29) of all ocular phenotypes forms biological network where “acetyl-CoA biosynthetic process from pyruvate” appears as top biological process (GO) and “metabolism” as top consolidated pathways-2013. Network legends are depicted in the top right corner. The downward arrow query genes are involved in the top biological process, while the triangle-shaped query genes are included in visual perception (q value = 1.01 × 10−3).
Figure 8
 
Functional analysis of largest cluster from all ocular phenotypes combined. Dendrogram of all ocular phenotypes combined. At 80% similarity index, the largest cluster (cluster 29) of all ocular phenotypes forms biological network where “acetyl-CoA biosynthetic process from pyruvate” appears as top biological process (GO) and “metabolism” as top consolidated pathways-2013. Network legends are depicted in the top right corner. The downward arrow query genes are involved in the top biological process, while the triangle-shaped query genes are included in visual perception (q value = 1.01 × 10−3).
Table 1
 
Ocular and Systemic Phenotype Groups in GEDs
Table 1
 
Ocular and Systemic Phenotype Groups in GEDs
Table 2
 
Ocular Phenotypes Mostly Represented in Ocular Diseases
Table 2
 
Ocular Phenotypes Mostly Represented in Ocular Diseases
Table 3
 
Biological Network Analyses of Largest Cluster From 10 Phenotype Units
Table 3
 
Biological Network Analyses of Largest Cluster From 10 Phenotype Units
×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×