With the completion of the human genome
17 and draft sequences for several model organisms, it is now important to understand the function and interaction of genes. To catalog and study genes relevant to retinal function, currently available data are being collected to generate the mammalian catalog of expressed genes.
38 Although this study indicates that as many as 15,000 genes are expressed in the retina and RPE, the results represent a compilation from various sources with widely different supporting data. Furthermore, of 13,037 genes classified as high quality, less than 10% originate from nonhuman species, primarily mouse and cattle, with only 65 genes reported for the dog. To this end, we have produced the first normalized canine retinal cDNA library to initiate the complete annotation of canine retinal genes, and facilitate identification of those genes relevant to retinal disease research and comparative genomics. The almost 4000 unique sequences in our current database cover ∼27% to 30% of the proposed retinal catalog. Estimation of the total number of genes expressed in the retina may include genes expressed only in RPE. Although these genes may significantly affect retinal function, some (e.g.,
RPE65 39 40 41 42 ) are not present in our retinal library. This suggests that our database is indeed specific for genes expressed in the canine retina and therefore may represent a higher percentage of the overall genes expressed only in this tissue.
Normalization of the libraries has produced 38% redundancy within the investigated clones, which is within the range found in other studies,
43 and has lowered the abundance of highly expressed retinal genes such as rhodopsin, which is represented by only nine clones. Consequently, almost 80% of all unique library entries are supported by a single clone, and less than 5% of all assemblies merged more than 10 clones
(Table 2) . Through this process we also confirmed one recombinant clone. Overall, the data confirm the successful normalization and high quality of the library. The completion of the canine genome sequence draft has allowed us to locate more than 99% of the retinal sequences within the canine genome. The canine genome may be enriched for repetitive elements
16 that occur not only in coding sequences, but also can contribute significantly to developmental functions.
44 Although coded by the RepeatMasker program (http://www.repeatmasker.org/ provided in the public domain by the Institute for Systems Biology, Seattle, WA) before the BLASTN search, clones containing repetitive elements were not removed from the library, and surrounding sequences may contribute to difficulties in unique alignment against the canine genome sequence draft. Alternatively, errors in the draft assembly, gene duplications, and gene families sharing common sequence motifs would also be reflected in alignments of one clone to more than one genomic region. These problems are common to automated annotations and cannot be overcome easily. We are currently integrating several clones, not readily mapped with analytical and comparative genomics, into the canine/hamster RHDF
5000 panel.
45 This process will allow us to understand better the dynamics of the canine genome, improve future mapping methods, and contribute to resolve discrepancies between the sequence and RH map in the dog.
With only 32% of the canine retinal cDNA sequences representing annotated canine cDNA entries, the library significantly increases experimental support for retinal genes not yet cloned in the dog. At the same time, annotation is achieved through comparative genomics, which, in some cases, generates several problems as individual genes or splice variants might not be present in different species, or not yet identified. Furthermore, if EST clones are located in the 3′ untranslated region (UTR) of a gene, homology between species is often not sufficient for alignment.
We have analyzed several clones, particularly those not yielding intron–exon boundaries, finding that the library is indeed slightly biased toward the 3′ UTR. For at least two clones, chosen due to their position in the reported
prcd disease interval,
28 46 we confirmed that the corresponding genes are present in retinal mRNA pools, and the clones are not contributed through genomic contamination. In addition, these clones have been found critical for the respective disorders and are currently under investigation (data not shown). An even more prominent issue with cDNA libraries is the clustering of nonoverlapping sequences into genes. To date, no algorithm is ready to automate this procedure, and currently all nonoverlapping sequences are listed independently, even if assigned to the same gene (eg,
Table 3 ; no. 8 and 26). This problem becomes even more evident for contigs that have no or poorly supported title line annotation. For example, based on chromosomal location, the unidentified unspliced cluster (
Table 3 , no. 32) is likely part of the 3′UTR of the contig identified to be similar to
MALAT1 (
Table 3 ; no. 2), for which homology also is not very well supported. We are currently establishing an upgraded relational database to improve annotation in the future through a combination of title line annotation and mapping data with expression profiles.
A previous study based on human NEIBank EST entries lists the 30 most abundant retinal genes in humans,
23 17 of which were present in our canine retinal library. However, because of successful normalization, only one of these (retinal S-antigen) is listed within the top 47 genes in the present database. Relevance of the library to retinal function has been further assessed by comparison of distribution of unique entries into nine distinct functional GO classes between our library and all annotated RefSeq sequences of mouse and human
(Fig. 1) . In addition to an overrepresentation of genes associated with eye function and development, the relative increase of genes related to chaperone and ribosomal functions is in concordance with molecular mechanisms in the retina. Even within the 47 most abundant genes of the retinal library
(Table 3) , 5 have expression only in the retina, and 12 have ribosomal function. Seven genes in this group play an important role in mitochondrial function, a reflection of the high-energy metabolism requirements of retina, and another seven are associated with DNA or RNA processing mechanisms. Surprisingly, two of the highly abundant genes in the library do not have a coding function. One of these,
RNU47, is a well-described small nuclear RNA, but little is known about the second common observed cluster (homologue to
MALAT1). This noncoding RNA transcript has been reported in
Macaca mulatta to modulate proliferation of retinal neuroprogenitor cells in primate experimental myopia (Tkatchenko AV, Walsh PA, Tkatchenko TV, Gustincich S, Raviola E, unpublished data, 2005; GenBank accession number DQ148151). Of these 47 genes, only 8 have been cloned previously in the dog. Although most have been predicted from genomic data, both of the noncoding genes are unidentified in the dog, thus supporting the necessity for accurate records of retina-expressed genes to elucidate molecular functions.
Assessment of the small subset of clones available for the remaining five normalized and subtracted libraries revealed substantial amount of overlap (30.6% to 49.3%) with the normalized retinal EST library. However, no bias toward any particular functional class has been observed. Subtraction of cDNA libraries largely eliminates the effect of differential gene expression. Subtraction libraries, therefore, should be highly enriched for transcripts that are significantly reduced or missing from the subtracted population. We are in the process of investigating the function of the clones enriched in the respective subtraction libraries to determine whether these might reflect chemical or molecular pathways that are specific to the photoreceptors or changed with the onset of prcd. To facilitate the generation of an integrated retinal gene library, clones characterized in this study have been used as a driver for subtraction from the normalized retinal library, reducing the redundancy of the library to 7% with an estimated 20% overlap to the original library. This process will allow us to add missing genes to the database on a continuous basis. At the same time, genes known to be critical to retinal function, but missing from the current library, are cloned and manually added to complete a canine retinal gene catalog.
The database can be searched through a Web browser (www.bioinformatics.upenn.edu/canine_retinalESTs/ provided in the public domain by the University of Pennsylvania, Philadelphia, PA) using clone ID, key words included in title line annotation, chromosomal location, or BLASTN. Individual clone data sheets contain the current results, as well as information on RH mapping and cDNA microarray expression providing hyperlinks to respective results, if clones were included in these projects. The combination of annotation, location, and expression data of clones provides an easy tool to select new splice variants or novel candidates based on their characteristics. Clones are readily available to other investigators on request. Beyond the obvious advantage of providing the database for canine retinal disease studies, the library also will contribute further to refining synteny of the canine genome with other species of interest to retinal research and comparative genomics. We consider these results an important first step toward an integrated network for gene identification and expression patterns relevant to developmental and degenerative processes of the retina, and will continue to update this information and interlink our data with other existing tools.