May 2005
Volume 46, Issue 5
Free
Biochemistry and Molecular Biology  |   May 2005
The Lacrimal Gland Transcriptome Is an Unusually Rich Source of Rare and Poorly Characterized Gene Transcripts
Author Affiliations
  • Aylin M. Ozyildirim
    From the Departments of Cell Biology and
  • Graeme J. Wistow
    Section on Molecular Structure and Function, National Eye Institute, Bethesda, Maryland; and the
  • James Gao
    Section on Molecular Structure and Function, National Eye Institute, Bethesda, Maryland; and the
  • Jiahu Wang
    From the Departments of Cell Biology and
  • Douglas P. Dickinson
    Department of Oral Biology, Medical College of Georgia, Augusta, Georgia.
  • Henry F. Frierson, Jr
    Pathology, University of Virginia, Charlottesville, Virginia; the
  • Gordon W. Laurie
    From the Departments of Cell Biology and
Investigative Ophthalmology & Visual Science May 2005, Vol.46, 1572-1580. doi:https://doi.org/10.1167/iovs.04-1380
  • Views
  • PDF
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Aylin M. Ozyildirim, Graeme J. Wistow, James Gao, Jiahu Wang, Douglas P. Dickinson, Henry F. Frierson, Gordon W. Laurie; The Lacrimal Gland Transcriptome Is an Unusually Rich Source of Rare and Poorly Characterized Gene Transcripts. Invest. Ophthalmol. Vis. Sci. 2005;46(5):1572-1580. https://doi.org/10.1167/iovs.04-1380.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

purpose. To sequence and comprehensively analyze human and mouse lacrimal gland transcriptomes as part of the NEIBank project.

methods. cDNA libraries generated from normal human and mouse lacrimal glands were sequenced and analyzed by PHRED, RepeatMasker, BLAST, and GRIST. Human “lacrimal-preferred genes” and putative gene regulatory elements were respectively identified in UniGene and ConSite, and gene clustering was analyzed by chromosomal mapping. “Hypothetical proteins,” identified by keyword search, were verified by genomic alignment and queried in the Conserved Domain database and GEO Profiles.

results. The top six transcripts in human and mouse differed, revealing a previously unappreciated molecular divergence. The human transcriptome is enriched with transcripts from 29 lacrimal-preferred genes and a content of poorly characterized hypothetical proteins, proportionally greater than in all other tissues. Only 45% of lacrimal preferred, but 71% of hypotheticals, have mouse orthologs. Many of the latter display apparently altered cancer expression in the CGAP SAGE library collection—often in keeping with predicted WD40, protein kinase, Src homology 2 and 3, RhoGEF, and pleckstrin homology domains involved in cell signaling. At the genomic level, lacrimal-expressed genes show some evidence of clustering, particularly on human chromosomes 9 and 12. Binding sites for TFAP2A, FOXC1, and other transcription factors are predicted.

conclusions. Interspecies divergence cautions against use of mouse models of human dry eye syndromes. Lacrimal preferred and hypothetical proteins, gene clustering, and putative gene regulatory elements together provide new clues for a molecular understanding of lacrimal gland function and mechanisms of coordinated tissue-specific transcriptional regulation.

Genomic sequencing has provided a wealth of new opportunities in the exploration of biological systems and treatment of disease. Much has been dependent on the availability of large and diverse collections of cDNAs and protein databases that build sequence alignment 1 into the process of gene recognition. 2 Novel genes have been identified, and simultaneous monitoring under different experimental settings has led to a growing appreciation of the genetic robustness 3 and conservation of organisms, organs, and cellular systems. Particularly intriguing are those predicted genes in complete genomes that remain hypothetical or poorly represented in expressed sequence databases, and about which little is known regarding their normal function or disease relevance. 
Exocrine and endocrine glands are highly specialized tissues that have served historically as platforms for protein and gene discovery, leading to the identification of new growth factor families (i.e., epidermal growth factor [EGF], 4 nerve growth factor [NGF] 5 from salivary gland, and fibroblast growth factor [FGF] from pituitary gland 6 ), as well as hormones and other prominent protein classes. Some glandular proteins are less readily detected elsewhere or can display distinctively altered expression and serve as tumor markers—most notably estrogen receptor 1 in breast cancer. Recently lacritin, 7 a tissue-preferred transcript from the lacrimal gland, has been suggested to be a SAGE (serial analysis of gene expression) marker for circulating breast cancer cells. 8 Such tissues therefore represent potential expression sites for predicted mitogens, receptors, and signaling mediators that may have important disease applications. 
Knowledge of glandular gene products essential for normal tissue function and an understanding of the mechanisms governing their expression are prerequisites for the development of future therapeutics. 9 10 Such treatment is potentially applicable to a wide variety of diseases or injuries. Dry eye, the most common eye disease, 11 appears to be highly suited to this approach, as the affected lacrimal glands are readily accessible to manipulation, and the composition of basement-membrane–adherent secretory cells is highly homogeneous. 12 We report on the first large-scale sampling of human and mouse lacrimal gland transcriptomes. Rare transcripts and those corresponding to hypothetical proteins predicted from genome sequence proportionally exceed any other organ. Many display altered expression in carcinoma of the breast, prostate, and other organs. In addition, fundamental molecular differences between human and mouse glands are revealed. Chromosomal distribution and transcriptional restriction combined with ConSite analyses unveil the first predictions of lacrimal multi- and single-gene transcriptional regulation. 
Materials and Methods
Tissue Specimens and Histologic Analysis
Normal human lacrimal glands (1 × 1.5 cm) of one male (75 years) and one female (51 years) donor were collected anonymously by the Mid-Atlantic Division of the Cooperative Human Tissue Network (CHTN), funded by the National Cancer Institute (Frederick, MD). Time from death to snap freezing was 6 hours 5 minutes and 7 hours 45 minutes, respectively. To assess the normalcy of the human specimens, frozen sections were stained with eosin and microscopically examined (Fig. 1) . This research adhered to the tenets of the Declaration of Helsinki and was considered exempt by the University of Virginia Human Investigation Committee. Murine exorbital lacrimal glands were collected from 75 C57Bl6 male mice (7–8 weeks) and immediately frozen. Care was taken to avoid contamination with salivary (and intraorbital Harderian) glands. Animal use adhered to the ARVO Statement for the Use of Animals in Ophthalmic and Vision Research. 
Library Construction and Sequencing
cDNAs were prepared from purified polyA+ RNAs (male and female human polyA+ RNAs combined; all mouse polyA+ RNAs) with a NotI primer-adapter (5′-pGACTAGTTCTAGATCGCGAGCGGCCGCCC(T)15-3′) for first-strand synthesis and directionally cloned into pCMVSPORT6 (Invitrogen-Life Technologies; Carlsbad, CA) by Bioserve Biotechnology (Laurel, MD) according to protocols in the manufacturer’s instruction manual (SuperScript Plasmid System; Invitrogen-Life Technologies; http://www.invitrogen.co.jp/products/pdf/custom%20_pCMVSPORT6.1library_man.pdf). Sequencing was performed by the NIH Intramural Sequencing Center. The first (oj1–oj32) and second (oj33–oj59) rounds of human lacrimal sequencing and a single round of mouse (ou) sequencing respectively yielded 2906, 2392, and 2238 high-quality useable sequences averaging 517, 535, and 323 bp in length. Some sequences (ni01–ni08) obtained from a previously prepared human lacrimal gland library 13 (565 useable sequences averaging 319 bp) were not included in subsequent analyses, with the exception of those involving UniGene. 
Computational Analyses
Sequences were systematically analyzed by using a stepwise approach 14 15 first with PHRED (CodonCode Corp., Dedham, MA), 16 then RepeatMasker (exclusion of non-mRNA sequences derived from vector, ribosomal, or mitochondrial nucleic acid; http://ftp.genome.washington.edu/cgi-bin/RepeatMasker/ provided in the public domain by the University of Washington Genome Center, Seattle, WA), followed by BLAST 17 and GRIST (GRouping and Identification of Sequence Tags 15 ; http://neibank.nei.nih.gov/index.shtml; BLAST, UniGene, and GenBank are provided by NIH at www.ncbi.nlm.nih.gov/). GRIST assembles identical or near-identical expressed sequence tags (ESTs) into tables with the UniGene name, chromosomal location, and cellular role, according to functional categories designated by the Gene Ontology Project (GO; http://www.geneontology.org). For a simplified graphic representation, GO Slim terms were used. All lacrimal sequences have been deposited in GenBank (accession numbers: CA946271–CA946547; CD721760–CD724311; CK429154–CK431261; CK615661–CK617067) and are available in GRIST format at NEIBank (http://neibank.nei.nih.gov/index.shtml). The accession number of the novel cDNA lacrein is AY702028. 
Hypothetical Protein Analyses
Multiorgan comparison of sequences coding for hypothetical proteins took advantage of Entrez Nucleotide’s Preview/Index (www.ncbi.nlm.nih.gov) with keyword searches such as human[Organism] AND /tissue_type=“brain” AND hypothetical, which draws from all mRNAs. Selecting only sequences from gbEST (gbdiv_est[prop]) was considered less reliable. Several organs were omitted from final statistical analysis due to small sample size (human pituitary gland; mouse cervix, cord blood, endometrium, head_neck, larynx, and nasopharynx). Identification and tabulation of lacrimal hypothetical sequences emerged from GRIST. Because none has yet been annotated as such in GenBank, it was not possible to use Entrez Nucleotide. Instead, we searched our sequences in NEIBank (lacrimal “oj” library) using “hypothetical” as a keyword. For authentication, all human and mouse ESTs designated as coding for hypothetical proteins were individually mapped to human or mouse genomes by BLAST or the Map Viewer link in Entrez Nucleotide. Statistical analyses were performed by Prism (GraphPad, San Diego, CA), with results expressed as the mean ± SD. 
Chromosomal Distribution, UniGene Analyses, ConSite, and SAGE Comparison
To identify lacrimal ESTs with limited expression in other organs (“lacrimal-preferred” genes), we manually inspected each UniGene entry, selecting those human genes currently listed in six or fewer other organs. Cell line, tumor, pooled organ, blood, and placenta expression was excluded. To search for conserved gene regulatory sites in lacrimal-preferred genes, we extracted 2 kb immediately upstream of the ATG translation initiation codon, including part of the first exon from both human and mouse genomes by Map Viewer and Entrez Nucleotide, and searched for putative human transcription factor binding sites by ConSite (http://www.phylofoot.org/ 18 ) using default settings. ConSite integrates Bayesian phylogenetic footprinting 19 with sequence profiles of known binding sites. 18 To search for potential cancer relevance, we queried the Cancer Genome Anatomy Project 20 21 collection (GDS217) in GEO Profiles (National Center for Biotechnology Information [NCBI], Bethesda, MD) with each human hypothetical protein, averaged normalized tags per million values for individual normal or cancer tissue categories, and determined the increase or decrease (x-fold) in cancer. Similar screens were systematically performed of the cancer microarray data sets GDS2, GDS8, GDS9, GDS73-GDS75, GDS84, and GDS88-GDS90. Code was written to display the chromosomal distribution of human cDNAs at 39-Mb intervals on human genome build 34 (version 1). 
Results
Highest-Abundance Genes in the Two Species
The polarized secretory acinar cells (Fig. 2) , which make up much of the cellular mass of the adult human or mouse lacrimal gland, release a rich array of proteins that prevent bacterial infection and nurture the avascular but highly innervated cornea. Other key protein secretory cells include duct and endothelial cells, and lymphocytes. Sequencing of more than 5000 human and 2000 mouse lacrimal cDNAs revealed a diverse collection of expressed genes. The most highly expressed differed between species. In humans, cDNAs encoding the secretory proteins lysozyme (LYZ; hydrolysis), proline rich 4 (PRR4; function unknown), lipocalin 1 (LCN1; lipid, hydrophobic carrier 22 ), lactotransferrin (LTF; iron binding), proline rich 1 (PROL1; function unknown), and lacritin (LACRT; prosecretory mitogen 7 ) were most abundant. However, the murine top six transcripts corresponded to odorant binding protein 1a (Obp1a; contains a large lipocalin domain); allergen dI chain C2C (C2c; function unknown); a novel hypothetical protein that we refer to as lacrein, which is highly restricted to lacrimal gland and bears some similarities (17.4% identical; 32% similar according to near-optimal alignment 23 ) to human lacritin; salivary protein 1 (Spt1; function unknown); a putative hydrolytic enzyme similar to triacylglycerol lipase (gastric precursor; Lipf); and a unique chromosome 17 UniGene cluster restricted to lacrimal and pituitary glands. Such striking dissimilarity suggests a previously unappreciated molecular divergence not reflected in comparison of all human and mouse lacrimal cDNAs by higher-order GO Slim terms (Fig. 2B)
Enrichment with Poorly Characterized Hypothetical Proteins
In addition to these highly expressed but incompletely understood proteins, numerous human lacrimal cDNAs code for proteins designated as hypothetical—a term originating from their initial identification by informatics from the genome. Now many of these rare, poorly, or only recently characterized gene products derive from confirmed gene models well supported by multiple ESTs. Some hypotheticals have names. Most, however, have not been subjected to functional analyses in biological systems, despite often intriguing domain predictions in Entrez Gene. A keyword search of the human lacrimal library revealed 238 distinct hypothetical proteins (Table 1)of 2022 GRIST clusters of similar or identical sequences—that is, 4.5% of all 5298 human lacrimal cDNAs. Of those, 62% corresponded to the FLJ series of newly-predicted, full-length transcripts, although the lacrimal gland was not one of the 107 human tissues sequenced. 24 In contrast, hypotheticals account for 0.6% ± 0.7% (median: 0.2%, range: 0%–2.2%) of cDNAs (shown in italics in Table 2 ) in 22 similar-sized GenBank organ pools represented by >1000 and <6670 cDNAs. Analogous mean proportional values are obtained in organs of human or mouse (0.5% ± 0.6%) when no restriction is placed on the number of cDNAs (Table 2) , in agreement with 0.4% for the whole human or mouse transcriptomes (currently 31,669 hypothetical of 8,576,334 [human]; and 26,636 of 6,673,691 [mouse]). Other organs with elevated hypothetical levels include human testis (2.8%), spinal cord (2.2%), and mouse lens (2.2%), but not mouse lacrimal gland (Tables 1 2) , although manual inspection of the latter suggests numerous poorly characterized genes. To assess EST and gene legitimacy, we aligned each EST to human genomic sequence according to BLAST or the Map Viewer link in Entrez Nucleotide. We also checked for mouse orthologs (Table 1)and evaluated whether open reading frames generally exceeded 100 amino acids. In keeping with 3′ sequencing, 82% (196; 3.7% of 5298) aligned to exons in the 3′ half of confirmed gene models or are predicted to achieve the same by alternative splicing; and 71% have a mouse ortholog. Those remaining aligned to intergenic regions (8%), introns (2%; possible nonmessenger RNAs 25 ) or pseudogenes (0.8%; Table 1 ). The mean open reading frame size for genes of 3′ half exonic alignment was 523 amino acids. Five have open reading frames of 101 amino acids or less, but all are predicted to be protein coding and only one is not highly conserved in mouse. This relative enrichment in largely uncharacterized hypothetical proteins (as suggested by exon alignment of ESTs from genes largely represented also in mouse) could in part reflect restricted expression and the poorly characterized nature of the lacrimal gland. Four are moderately rare transcripts (described later); however, most display wide (mean of 21 different organs) and abundant expression (mean of 235 ESTs per organ), as determined by manual inspection of UniGene. Protein domain structure and disease database analyses described in the following sections to assess their relevance and genomic organization suggest mechanisms of gene regulation. 
Signaling Domains
To look more closely at the types of hypothetical proteins expressed, all 196 confirmed sequences were queried against Gene (NCBI) with particular attention to Conserved Domain Database designations. Of those, 67% were represented by at least one domain, for a total of 196 different domains, including 17% designated as uncharacterized conserved protein domains. Zinc finger domains of the C2H2, C2C2, DHHC, and unnamed types were the most prevalent (Fig. 3) , representing 4% of the total. Relatively common as a group (11%) were domains involved in protein–protein interactions during signal transduction, including WD40, protein kinase, Src homology 2 and 3, RhoGEF, and pleckstrin homology domains. Notable examples include FLJ33962 (RhoGEF 19) and FLJ31208, which contain one of each Src homology domain. FLJ20275 (transducer of Cdc42-dependent actin assembly; TOCA1) features single RhoGEF and Rho binding protein kinase C-related kinase homology region 1 domains. Also expressed are new members of the serine/threonine kinase (FLJ25006) and Rab1 small G protein (FLJ10101) families; a Src homology 2 domain protein (FLJ20967; SH2D4A); a ubiquitin homolog (DkFZp434N1923); a Ran binding protein (FLJ22794); a cytoplasmic dynein intermediate chain homolog (FLJ10300); and a predicted cyclin domain protein (FLJ40432). With the exception of FLJ10300, all the latter have mouse orthologs. 
Gene Clustering and Higher-Order Transcriptional Regulation
Linking global expression data with genomic sequence makes possible a search for chromosomal regions potentially regulated by tissue-specific mechanisms, such as enhancers. We mapped all detected hypothetical and known human genes by expression level to chromosomes segmented into 39-Mb bins (Fig. 4) . Regions on chromosomes 12 (0–79 Mb) and 9 (120–136 Mb) were particularly active (Fig. 4 , filled bars), and although both are gene rich, there is no particular enrichment for lacrimal gland genes (Fig. 4 , unfilled bar). The proximal half of chromosome 12 is distinguished by elevated expression of LYZ, PROL4, LACRT, NR4A2, and AQP5, and includes C2F, FLJ10292, FLJ10298, FLJ10652, FLJ11773, KIAA0528, KIAA1040, KIAA1238, KIAA1536, MGC5576, MGC16044, MGC23401, and PRO1843. The GC-rich 26 distal end of chromosome 9 is the source of many LCN1 transcripts and others, including those coding for hypothetical proteins FLJ10101, LOC89958, LOC389794, and MGC10526. Also notably active was the 40- to 79-Mb region of chromosomes 3, 4, and 11. Least active were chromosomes 13, 14, 18, 21, X, and Y. Chromosomes 13, 18, 21, and Y contain several low-gene-density regions. Recent comparison to other NEIBank libraries (Gao J, Wistow G, unpublished data, 2005) reveal that genomic regions of heightened gene activity often differ among tissues. 
To examine whether any of these genes was subject to higher-order regulation that drove expression specifically to the lacrimal gland, advantage was taken of UniGene, which now contains almost 6 million ESTs from different human tissues. We manually scanned all human lacrimal genes in UniGene, recording the number of organs and ESTs per organ. Excluded were pathologic, pooled placental, blood, or fetal tissues and samples derived from cell lines. Twenty-nine lacrimal-preferred genes were identified for which UniGene expression is limited to the human lacrimal gland and no more than six other organs. The mean number of other organ expressors and ESTs per organ is respectively 3 and 23, indicating relatively high organ specificity and low or rare transcription, with lacrimal transcription exceptionally elevated for lacritin, lipocalin 1, and proline-rich 1 (82, 218, and 111 ESTs, respectively). Four of 29 are hypotheticals (FLJ12205, FLJ20513, LOC221711, and LOC340385). UDP-N-acetyl-α-d-galactosamine:polypeptide N-acetylgalactosaminyltransferase (GALNT8), histone 1 H2 (HIST1H2AH), lacritin (LACRT), proline rich 1 (PROL1), and secretoglobin family 1D member 1 (SCGB1D1) were not detected elsewhere, in keeping with recent large scale microarray analyses (data sets GDS181, GDS594, and GDS596 27 28 ) and SAGE (Cancer Genome Anatomy Project). 
The chromosomal placement of lacrimal-preferred genes revealed that many are clustered (Fig. 4 , arrows), an arrangement suggestive of some coordinated transcription. FLJ20513, HTN1, and paralogous PROL1 and PROL3/5 are grouped within 0.36 Mb on chromosome 4 with respectively two and three genes separating HTN1 from FLJ20513 and FLJ20513 from PROL3/5. The clustered PROL multigene family on chromosome 4 and the clustered CST multigene family on chromosome 20 function as regions of correlated transcription in the salivary gland, based on microarray analysis of a collection of human organs that excluded lacrimal gland (supplemental human RCT data files, Ref. 28 p 95). On chromosome 11, paralogs SCGB1D1 and SCGB2A2 are separated by 0.084 Mb in a cluster with other family members. In contrast, chromosome 19 lacrimal-preferred genes are not clustered in the same chromosomal regions. We queried each lacrimal preferred protein in the EMBL STRING database (available at http://www.embl-heidelberg.de/; provided in the public domain by the European Molecular Biology Laboratory, Heidelberg, Germany). 29 STRING seeks to integrate known and predicted protein interactions, but no evidence of coexpression could be found. This result is in keeping with the absence of any human lacrimal gland microarray data; however, salivary gland hits might be expected from the SymAtlas 27 28 microarrays (http://www.gnf.org/ provided by the Genomics Institute of the Novartis Research Foundation [GNF], San Diego, CA). 
BLAST of the current mouse genome (build 33) revealed that only 13 (45%) of 29 human lacrimal-preferred genes have obvious orthologs in mouse. No conservation of clustering was apparent. Notable differences in mouse genes activated versus human were in line with the manner by which transcripts defined as lacrimal-preferred in human (Fig. 5)display a much wider tissue distribution in mouse (Fig. 5 , inset). Moreover of the top six human (LYZ, PRR4, LCN1, LTF, PROL1, and LACRT) and mouse (Obp1a, C2c, lacrein, Spt1, and Lipf; UniGene cluster) genes, only lysozyme (LYZ), lactoferrin (LTF), and gastric lipase (Lipf) are known to be shared between species. 
Key elements of tissue-specific gene regulation are often well conserved among different species and can be revealed by phylogenetic footprinting. 18 We therefore performed ConSite analysis of all but one (OSCAR mouse gene not yet placed) of the human–mouse lacrimal preferred ortholog pairs, using 2-kb upstream of the ATG translation start site. Putative GATA2, GATA3, SPI-1, SPI-B, and ZNF42 sites are represented most frequently (8–10 of 12 gene pairs), followed by lesser representation (3–5 gene pairs) of putative TFAP2A (AP-2α), ELK1, FOXF2, FOXF1, FOXC1, FOXL1, SRY, HAND1, and YY1 sites and a few others (2 or fewer gene pairs). Other factors emerging from this analysis that are detected in lacrimal gland or eye include CREB1, NFIL3, ELK1, FOXF2 GATA3, NHLH1, MAX, ZNF42, HAND1, USF1, and YY1. 
Expression in Human Cancers
A notable aberration in the expression of lacritin, 7 the sixth most abundant and one of 29 lacrimal-preferred genes is its reported amplification in some invasive breast cancers. 30 Indeed, the first lacritin EST in GenBank originated from a subtracted breast cancer library. 31 Other novel ESTs now serve as cancer markers. 32 To determine whether any human lacrimal hypothetical or other lacrimal-preferred proteins could be similarly involved in disease, we probed each in the Gene Expression Omnibus (NCBI) paying particular attention to the Cancer Genome Anatomy Project (CGAP) SAGE library 20 21 where lacritin’s cancer expression has been detected. The SAGE library consists of 123 (39 normal, 84 cancer) different human tissue samples, of which 21 are breast and 10 are prostate cancer. 33 Of the 196 lacrimal-expressed hypothetical proteins discussed herein, 104 (53%) display an apparent cancer-associated expression that is at least two times greater than normal in breast, prostate (Fig. 6) , brain, pancreatic, ovarian, stomach, peritoneal, and other cancers, whereas 47% (93/196) of the same and other hypotheticals display a two times or greater decrease (not shown), depending on the type of cancer. Among lacrimal-preferred genes, 16 (55%) of 29 display a similar twofold or greater increase or decrease. To estimate the frequency of cancer-related changes in expression over the whole lacrimal library, we manually queried the CGAP SAGE library with each of the first 1000 human lacrimal proteins. Of those, 11% (107) displayed a cancer increase or decrease (not shown). FLJ33962, FLJ31208, FLJ20275, and FLJ10101 are unchanged, whereas FLJ25006 (serine/threonine kinase) is increased 2-fold in breast (Fig. 6)and 12-fold in brain cancers. FLJ20967 (Src homology 2 domain) is increased 5-fold in breast and decreased 22-fold in ovarian (not shown) cancers. DkFZp434N1923 (ubiquitin homologue) displays a 2-, 9-, 14- and 45-fold increase, respectively, in prostate, colon, stomach, and pancreatic cancers. Similarly, Ran binding protein FLJ22794 is suppressed 32-fold in ovarian and amplified 2-, 7-, and 22-fold in brain, colon, and prostate cancers, respectively. FLJ10300 (cytoplasmic dynein intermediate) is increased 6-fold in brain cancer, and the predicted cyclin domain protein FLJ40432, 7- and 16-fold greater in prostate and stomach cancers, respectively. Most elevated in breast cancer is LOC89958, a predicted 389-amino-acid protein of 42.2 kDa that lacks a signal peptide or other defined domain and is 80% (313/389 amino acids) identical with an unknown mouse protein. LOC89958 is well-represented in UniGene by >30 different cancer libraries, including mammary adenocarcinoma, and is periodically regulated during cell cycling in HeLa cells, with maximum mRNA levels during mitosis. 34 Others elevated include FLJ14668 (138 amino acids; DUF842 domain), DKFZp761D0211 (552 amino acids; well conserved with integrin α chain similarity in InterPro, one transmembrane domain and localized to the cell surface) and BC002942 (707 amino acids; 7–11 predicted transmembrane domains; http://ebi.ac.uk/interpro/ provided in the public domain by the European Bioinformatics Institute, Hinxton, UK). 
Discussion
We describe the first human and mouse lacrimal gland EST databases, developed as part of the NEIBank Project. Key observations include (1) a previously unappreciated molecular divergence between human and mouse, as evidenced by differences in the most abundant transcripts and regulation of human lacrimal-preferred genes, (2) the suggestion that many human lacrimal-preferred genes arose late in evolution, (3) the presence of a proportionally greater number of human hypothetical proteins than any other tissue, (4) evidence of human gene clustering suggesting higher-order transcriptional regulation, and (5) despite interspecies differences, hints of conserved upstream elements that set the stage for promoter analyses. Development of both databases grew from the need for a catalog of lacrimal gland transcripts as a fundamental resource for study of normal and diseased glands. 
Could molecular divergence simply reflect an interspecies disparity in organ age, cDNA library preparation, or cellular composition? We suspect that organ age is not a factor. Arguably mature secretions would similarly predominate in the mouse and human glands used. Care was taken to prepare the cDNA libraries identically with minimal manipulation, and histologic examination suggests an equal proportion of epithelial cells. Perhaps most persuasive is how divergence centers on the most abundant genes. Query of the top six human and mouse genes in a rat lacrimal gland microarray data set GDS508, 35 the only lacrimal gland data set currently in GEO Profiles, obtained hits for only LYZ and Spt1 (Gene Expression Omnibus Profiles, http:// www.ncbi.nlm.nih.gov/projects/geo/ provided by NCBI). A record for Lipf was considered absent or at background levels. Numerous mouse strain-specific differences have been noted in gene expression of the brain, but many of the highest scoring hits are housekeeping genes ubiquitously expressed in other organs. 36 In contradistinction, several of the lacrimal diverging genes have no respective ortholog in mouse or human, although mouse Obp1a contains a lipocalin domain and mouse lacrein displays some homology with lacritin. Equally striking was how human lacrimal-preferred genes are broadly dispersed in mouse. Not addressed in this study of combined cDNAs from a single human male and a single female, but of considerable importance, are levels of variation among normal humans, changes in human ocular disease, and the well-known influence of sex hormones on human lacrimal gene expression—the latter well scrutinized in rodents. 37 38 Each is now addressable by multireplicate microarray, 39 and possibly also by study of single nucleotide polymorphisms in key lacrimal genes. Evolution of independent solutions for eye protection has important implications for the usage of animal models to study human dry eye syndromes. 
Discerning how molecular divergence is functionally manifested highlights lacrimal-preferred proteins—particularly those lacking mouse orthologs. Predicted roles as a constituent of the extracellular matrix, as an agonist, as a mediator of signal transduction or in gene regulation are intriguing. Some appear to be expressed only in lacrimal gland, although approaches more sensitive than UniGene have revealed other expressing tissues, including some lacritin expression in thyroid and salivary gland (LACRT 7 28 ). No lacritin expression was detected in normal breast by RNA dot blot, 7 microarray, 27 28 or CGAP SAGE, 20 21 although normal breast expression is apparent by real-time PCR. 40 Also, cystatin S is expressed at high levels in the human submandibular and parotid glands, and lower levels in some other tissues, as determined by direct measurement of mRNA levels. 41 Overall, however, it is clear that human lacrimal-preferred genes display a strong predilection for lacrimal gland—a property notably absent in mouse. This necessitates renewed focus on human-based mechanistic investigation using an expanded molecular toolbox of human molecular probes and cell systems including human lacrimal epithelial and fibroblastic cell lines challenged in standard and three-dimensional culture models. 
Why is human lacrimal gland disproportionally blessed with hypothetical proteins, most of which are poorly characterized? Initial concerns about authenticity were alleviated by appropriate alignment with authentic gene models, identification of mouse orthologs, and confirmation that open reading frames generally exceeded 100 amino acids. It was recently noted 42 that pericentromeric regions can be sources of new genes derived from euchromatic transposition. We searched all 1256 genes within 5 Mb of pericentromeric regions and found 11 corresponding to lacrimal genes coding for hypothetical proteins (i.e., 6% of all hypotheticals). None are gene duplications with ancestral donors—as per those tabulated by She et al. 42 —but nonpericentromeric FLJ40432 serves as a pericentromeric donor of one exon. 42 Similarly, among rare lacrimal-preferred genes, three (CST4, KIAA1754L, SLC13A2) are pericentromeric. None are indicated to be ancestral duplicons or to serve as pericentromeric donors, although CST4 is part of a multigene clustered family that probably arose by tandem duplication and is present in rat. KIAA1754L and SLC13A2 are both present in mouse. Subtelomeric duplications and exchanges are also common in eukaryotes. 42 43 Scrutiny of the genomic distribution of genes coding for hypothetical or lacrimal-preferred proteins respectively revealed that 11% and 10% are located within 5 Mb of either telomere, versus 12.6% of all genes in the entire human lacrimal library (as determined by systematic manual analysis). Thus, no single strategy defines how hypothetical or lacrimal-preferred protein genes are derived. Apparent up- or downregulation in different cancers necessitates comprehensive probing of other cancer and normal databases to appreciate spurious versus authentic changes underlying disease progression. This rich spoil of the human genome project offers remarkable new opportunities in lacrimal and potentially cancer cell biology. 
Segmental mapping and interspecies alignment provided interesting gene regulatory insights. Gene clustering is suggestive of coordinated transcription. Although lacrimal-preferred genes were transcribed widely in mouse, putative transcription factor binding sites were predicted by phylogenetic footprinting. 18 The transcription factor GATA2 is widely expressed in a number of different organs, including salivary gland and eye; but is most commonly associated with cell growth during hematopoiesis. 44 TFAP2A interacts with PAX6 in ocular development, including corneal epithelial repair 45 and, like TCF8, is involved in craniofacial development, 45 the latter by regulating TGFβ/BMP signaling. 46 FOXC1 regulates ocular and embryonic development 47 and is one of the transcripts detected in our human lacrimal gland library. 
Conclusions
Sequencing of human and mouse cDNAs from the poorly characterized lacrimal gland provides a physiological snapshot of a polarized exocrine secretory system unusually rich in rare and hypothetical proteins. Such proteins represent one of the last frontiers of human and mouse genome biology. Many hypothetical proteins display apparently altered cancer expression in the CGAP SAGE library collection—often in keeping with the predicted WD40, protein kinase, Src homology 2 and 3, RhoGEF, and pleckstrin homology domains involved in cell signaling. At the genomic level, lacrimal-expressed genes show some evidence of clustering, particularly on chromosomes 9 and 12, and conserved upstream elements may provide new clues to mechanisms of coordinated tissue-specific transcriptional regulation. Implications for the treatment of dry eye, the most common eye disease, are considerable, because the regulation of secretory and transcriptional processes in the lacrimal gland is poorly understood. Cellular models developed around this tissue provide an outstanding platform for functional analysis by RNA interference (RNAi). 
 
Figure 1.
 
Histology of human male (A) and female (B) lacrimal glands from which cDNAs were derived for sequencing. PolyA+ RNAs from both sexes were combined for cDNA generation. Other than some collagen in the male sample, the glands appeared healthy. Frozen sections stained with eosin.
Figure 1.
 
Histology of human male (A) and female (B) lacrimal glands from which cDNAs were derived for sequencing. PolyA+ RNAs from both sexes were combined for cDNA generation. Other than some collagen in the male sample, the glands appeared healthy. Frozen sections stained with eosin.
Figure 2.
 
Lacrimal glands are packed with polarized acinar cells that release nascent tear proteins into ducts via constitutive or regulated secretory pathways. (A) Representation of predominant human proteins expressed in an acinar cell and small lymphocyte based on abundance of cDNAs sequenced. Other cell types (duct and endothelial cells) contribute some proteins. Many are associated with ribosomes, endoplasmic reticulum (left), and nuclei. Shown are: (1) LYZ, (2) PROL4, (3) LCN1, (4) LTF, (5) PROL1, (6) LACRT, (7) AZGP1, (8) PIP, (9) EEF1A1, (10) SCGB2A1, (11) DDX5, (12) EHF, (13) PROL3, PROL5, (14) LIMO4, (15) TXNIP, (16) PIGR, (17) HSPG2, (18) HNRPA2B1, (19) RFC4, (20) SFRS5, (21) NR4A2, (22) RPL3, (23) CCNL2, (24) XBP1, (25) DSIPI, (26) SUPT5H, (27) SAT, (28) RP3A, (29) SRRM2, (30) RPS6, (31) EPS8L2, (32) NR4A1, (33) EEF1G, (34) CST4, (35) RNP24, (36) PDCD4, (37) FOXC1, (38) RPL17, (39) SREBF1, (40) KLHDC2, (41) RPL13, (42) ANXA5, (43) CD164, (44) AQP5, (45) RPS8, (46) OGT, (47) RPL13a, (48) NFAT5, (49) SCGB1D1, (50) UBC, (51) AARS, (52) ARFGAP3, and (53) RPLPO. (B) Nonredundant GO Slim representation of 7356 human (Hu) and 1052 mouse (Ms) lacrimal cDNAs. Analysis was performed in a nonredundant manner to reflect protein complexity. The two species are functionally very similar, although the most highly expressed genes differ dramatically.
Figure 2.
 
Lacrimal glands are packed with polarized acinar cells that release nascent tear proteins into ducts via constitutive or regulated secretory pathways. (A) Representation of predominant human proteins expressed in an acinar cell and small lymphocyte based on abundance of cDNAs sequenced. Other cell types (duct and endothelial cells) contribute some proteins. Many are associated with ribosomes, endoplasmic reticulum (left), and nuclei. Shown are: (1) LYZ, (2) PROL4, (3) LCN1, (4) LTF, (5) PROL1, (6) LACRT, (7) AZGP1, (8) PIP, (9) EEF1A1, (10) SCGB2A1, (11) DDX5, (12) EHF, (13) PROL3, PROL5, (14) LIMO4, (15) TXNIP, (16) PIGR, (17) HSPG2, (18) HNRPA2B1, (19) RFC4, (20) SFRS5, (21) NR4A2, (22) RPL3, (23) CCNL2, (24) XBP1, (25) DSIPI, (26) SUPT5H, (27) SAT, (28) RP3A, (29) SRRM2, (30) RPS6, (31) EPS8L2, (32) NR4A1, (33) EEF1G, (34) CST4, (35) RNP24, (36) PDCD4, (37) FOXC1, (38) RPL17, (39) SREBF1, (40) KLHDC2, (41) RPL13, (42) ANXA5, (43) CD164, (44) AQP5, (45) RPS8, (46) OGT, (47) RPL13a, (48) NFAT5, (49) SCGB1D1, (50) UBC, (51) AARS, (52) ARFGAP3, and (53) RPLPO. (B) Nonredundant GO Slim representation of 7356 human (Hu) and 1052 mouse (Ms) lacrimal cDNAs. Analysis was performed in a nonredundant manner to reflect protein complexity. The two species are functionally very similar, although the most highly expressed genes differ dramatically.
Table 1.
 
Organ Distribution of Hypothetical Proteins
Table 1.
 
Organ Distribution of Hypothetical Proteins
Organ Human % Hypothetical (Total Organ cDNAs)* Mouse % Hypothetical (Total Organ cDNAs)*
Adrenal gland 0.08 (20750) 0.45 , ‡ (4424)
Aorta 0.2 (1137) 1.1 (17396)
Brain 0.6 (394140) 0.09 (403655)
Breast 0.16 (95959) 0.2 (1066)
Cardiac muscle 0.7 (278) 0.04 (2119)
Cervix 0.4 (31149) 0 (0)
Colon 0.7 (154060) 1.0 (24022)
Cord Blood 0 (11406) 0 (0)
Endometrium 1.2 (14378) 0 (98553)
Epididymis 1.1 (175) 1.2 (5280)
Esophagus 1.6 (3946) 0 (1)
Eye 0.1 (253323) 0.07 (98553)
Gallbladder 0 (5) 0.06 (1678)
Head/Neck 0.75 (1596) 0 (0)
Heart 0.7 (52054) 0.5 (70257)
Hypothal 0.4 (35207) 1.0 (19695)
Iris 0 (4316) 0 (0)
Kidney 0.6 (141668) 0.1 (421301)
Lacrimal gland, † 4.5 (5298) 0.7 (2238)
Larynx 0.9 (1156) 0 (1)
Lens 0.002 (45801) 2.2 (1271)
Ligament 0 (3939) 0 (93)
Liver 0.2 (155560) 0.75 (57464)
Lung 0.4 (323698) 0.5 (130725)
Mammary gland 0.2 (3138) 0.9 (138491)
Marrow 0.2 (47962) 0.03 (76292)
Nasopharynx 0.05 (28860) 0 (0)
Nose 0.2 (2613) 0 (14)
Ovary 0.5 (133807) 0.8 (22751)
Oviduct 0 (27) 1.4 (6661)
Pancreas 1.4 (206442) 1.1 (49004)
Parathyroid 0.6 (22967) 0 (200)
Pituitary gland, § 11.8 (51) 0.7 (14780)
Placenta 0.2 (146183) 0.6 (32582)
Prostate 0.2 (126855) 0 (268)
Retina 0.25 (76081) 0.3 (83724)
RPE 0 (52973) 0 (5117)
Salivary gland 0.06 (54316) 0 (1752)
Skeletal muscle 0.45 (33733) 1.5 (4854)
Skin 0.2 (181195) 0.2 (64580)
Small intest 0.3 (15929) 0.7 (12121)
Smooth muscle 0.2 (1863) 0 (1102)
Spinal cord 2.2 (2581) 0.74 (29798)
Spleen 1.2 (9613) 1.1 (442911)
Stomach 0.2 (99069) 0.08 (26228)
Testis 2.8 (60938) 0.4 (152448)
Thyroid gland 0.14 (2119) 1.8 (6)
Whole organism 0.4 (8576334) 0.4 (6673691)
Table 2.
 
EST Alignment to Exons, Introns, or Intergenic Regions for All Lacrimal Gland Hypothetical Proteins
Table 2.
 
EST Alignment to Exons, Introns, or Intergenic Regions for All Lacrimal Gland Hypothetical Proteins
Alignment Human EST Mouse Ortholog Mouse EST Human Ortholog
5′ Exons 11 7 0 0
Middle or 3′ exons 165* 130 8 7
Total exons 176 137 8 7
Introns, possibly exon via alternative splicing* 11 9 0 0
Total introns 16 12 0 0
Intergenic, possibly exon via alternative splicing* 19 12 3 3
Total intergenic 37 20 4 3
Pseudogenes 2 0
Ambiguous 6 0 0
Total hypothetical 237 12
Total orthologs 169 10
Figure 3.
 
Eleven most frequent protein domains in human hypothetical proteins, excluding those designated as uncharacterized conserved domains: zinc finger (smart00356, smart00451, KOG1994, KOG2462, KOG2990, and KOG1311); WD40 (cd00200, COG2319, KOG2321, and KOG0271); protein kinase (cd00276, cd00089, pfam00069, and smart00220), Src homology (cd00173, cd00174, and smart00326); RhoGEF (cd00160 and smart00325); O-methyltransferase (COG4122, KOG1661, KOG2915, and pfam01135); amino acid (AA) transporters (KOG1287, KOG1303, and KOG3832); membrane protein (COG5373, KOG3318, and KOG3918); myosin tail (or chain; KOG0161 and pfam01576); pleckstrin homology (smart00233); and tetratricopeptide repeat (cd00189).
Figure 3.
 
Eleven most frequent protein domains in human hypothetical proteins, excluding those designated as uncharacterized conserved domains: zinc finger (smart00356, smart00451, KOG1994, KOG2462, KOG2990, and KOG1311); WD40 (cd00200, COG2319, KOG2321, and KOG0271); protein kinase (cd00276, cd00089, pfam00069, and smart00220), Src homology (cd00173, cd00174, and smart00326); RhoGEF (cd00160 and smart00325); O-methyltransferase (COG4122, KOG1661, KOG2915, and pfam01135); amino acid (AA) transporters (KOG1287, KOG1303, and KOG3832); membrane protein (COG5373, KOG3318, and KOG3918); myosin tail (or chain; KOG0161 and pfam01576); pleckstrin homology (smart00233); and tetratricopeptide repeat (cd00189).
Figure 4.
 
Lacrimal gene activity is unevenly distributed over the human genome and displays clustering suggestive of cotranscriptional regulation. Arrows: location and orientation of lacrimal-preferred genes, defined as those expressed in six or fewer other organs, as determined by manual inspection of UniGene (NCBI). Gene activity is represented by number of cDNA clones per gene and is compared with gene number in 39-Mb bins. Genes in the proximal half of chromosome 12 and distal end of chromosome 9 are particularly active. Not detected but present on proximal chromosome 12 is AAAS, the mutation of which is associated with alacrima.
Figure 4.
 
Lacrimal gene activity is unevenly distributed over the human genome and displays clustering suggestive of cotranscriptional regulation. Arrows: location and orientation of lacrimal-preferred genes, defined as those expressed in six or fewer other organs, as determined by manual inspection of UniGene (NCBI). Gene activity is represented by number of cDNA clones per gene and is compared with gene number in 39-Mb bins. Genes in the proximal half of chromosome 12 and distal end of chromosome 9 are particularly active. Not detected but present on proximal chromosome 12 is AAAS, the mutation of which is associated with alacrima.
Figure 5.
 
Human lacrimal-preferred genes display no overall preference for other organs, whereas mouse orthologs are more widely activated (inset). These observations are in keeping with the possible presence of a human lacrimal enhancer that is poorly conserved in mouse.
Figure 5.
 
Human lacrimal-preferred genes display no overall preference for other organs, whereas mouse orthologs are more widely activated (inset). These observations are in keeping with the possible presence of a human lacrimal enhancer that is poorly conserved in mouse.
Figure 6.
 
More than one half of lacrimal hypothetical proteins display a twofold or greater increase in cancer-associated expression, dependent on cancer type and biopsy. Slightly less than one half decrease in other cancers (not shown). Illustrated are breast and prostate cancer data compiled from the Cancer Genome Anatomy Project SAGE library collection in GEO (NCBI). Only a few (*) lack apparent mouse orthologs.
Figure 6.
 
More than one half of lacrimal hypothetical proteins display a twofold or greater increase in cancer-associated expression, dependent on cancer type and biopsy. Slightly less than one half decrease in other cancers (not shown). Illustrated are breast and prostate cancer data compiled from the Cancer Genome Anatomy Project SAGE library collection in GEO (NCBI). Only a few (*) lack apparent mouse orthologs.
The authors thank Mid-Atlantic CHTN Director Christopher Moskaluk for assistance in human tissue procurement; Carrol Lawrence, Christopher Scott, Rocio-Maria Brion Garza, and Patee Buchoff for data analysis; and David Lipman (NCBI) for suggestions on hypothetical protein analysis. 
CurwenV, EyrasE, AndrewsTD, et al. The Ensembl automatic gene annotation system. Genome Res. 2004;14:942–950. [CrossRef] [PubMed]
RogicS, MackworthAK, OuelletteFB. Evaluation of gene-finding programs on mammalian sequences. Genome Res. 2001;11:817–832. [CrossRef] [PubMed]
ThomasJD, LeeT, SuhNP. A function-based framework for understanding biological systems. Ann Rev Biophys Biomol Struct. 2004;33:75–93. [CrossRef]
CohenS. Nobel lecture: epidermal growth factor. Biosci Rep. 1986;6:1017–1028. [CrossRef] [PubMed]
Levi-MontalciniR. The nerve growth factor 35 years later. Science. 1987;237:1154–1162. [CrossRef] [PubMed]
GospodarowiczD. Localisation of a fibroblast growth factor and its effect alone and with hydrocortisone on 3T3 cell growth. Nature. 1974;249:123–127. [CrossRef] [PubMed]
SanghiS, KumarR, LumsdenA, et al. cDNA and genomic cloning of lacritin, a novel secretion enhancing factor from the human lacrimal gland. J Mol Biol. 2001;310:127–139. [CrossRef] [PubMed]
BosmaAJ, WeigeltB, LambrechtsAC, et al. Detection of circulating breast tumor cells by differential expression of marker genes. Clin Cancer Res. 2002;8:1871–1877. [PubMed]
LuHH, KofronMD, El-AminSF, AttawiaMA, LaurencinCT. In vitro bone formation using muscle-derived cells: a new paradigm for bone tissue engineering using polymer-bone morphogenetic protein matrices. Biochem Biophys Res Commun. 2003;305:882–889. [CrossRef] [PubMed]
LangerR, TirrellDA. Designing materials for biology and medicine. Nature. 2004;428:487–492. [CrossRef] [PubMed]
SchaumbergDA, SullivanDA, BuringJE, DanaMR. Prevalence of dry eye syndrome among US women. Am J Ophthalmol. 2003;136:318–326. [CrossRef] [PubMed]
WangJ, LaurieGW. Organogenesis of the exocrine gland. Dev Biol. 2004;273:1–22. [CrossRef] [PubMed]
DickinsonDP, ThiesseM. A major human lacrimal gland mRNA encodes a new proline-rich protein family member. Invest Ophthalmol Vis Sci. 1995;36:2020–2031. [PubMed]
WistowG, BernsteinSL, WyattMK, et al. Expressed sequence tag analysis of adult human lens for the NEIBank Project: over 2000 non-redundant transcripts, novel genes and splice variants. Mol Vis. 2002;8:171–184. [PubMed]
WistowG, BernsteinSL, TouchmanJW, et al. Grouping and identification of sequence tags (GRIST): bioinformatics tools for the NEIBank database. Mol Vis. 2002;8:164–170. [PubMed]
EwingB, GreenP. Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res. 1998;8:186–194. [PubMed]
AltschulSF, GishW, MillerW, MyersEW, LipmanDJ. Basic local alignment search tool. J Mol Biol. 1990;215:403–410. [CrossRef] [PubMed]
LenhardB, SandelinA, MendozaL, EngstromP, JareborgN, WassermanWW. Identification of conserved regulatory elements by comparative genome analysis. J Biol. 2003;2:13. [CrossRef] [PubMed]
WassermanWW, PalumboM, ThompsonW, FickettJW, LawrenceCE. Human-mouse genome comparisons to locate regulatory sites. Nat Genet. 2000;26:225–228. [CrossRef] [PubMed]
StrausbergRL. The Cancer Genome Anatomy Project: new resources for reading the molecular signatures of cancer. J Pathol. 2001;195:31–40. [CrossRef] [PubMed]
PorterDA, KropIE, NasserS, et al. A SAGE (serial analysis of gene expression) view of breast tumor progression. Cancer Res. 2001;61:5697–5702. [PubMed]
GasymovOK, AbduragimovAR, YusifovTN, GlasgowBJ. Site-directed tryptophan fluorescence reveals the solution structure of tear lipocalin: evidence for features that confer promiscuity in ligand binding. Biochem. 2001;40:14754–14762. [CrossRef]
SmootME, GuerlainSA, PearsonWR. Visualization of near-optimal sequence alignments. Bioinformatics. 2004;20:953–958. [CrossRef] [PubMed]
OtaT, SuzukiY, NishikawaT, et al. Complete sequencing and characterization of 21,243 full-length human cDNAs. Nat Genet. 2004;36:40–45. [CrossRef] [PubMed]
HuttenhoferA, BrosiusJ, BachellerieJP. RNomics: identification and function of small, non-messenger RNAs. Curr Opin Chem Biol. 2002;6:835–843. [CrossRef] [PubMed]
HumphraySJ, OliverK, HuntAR, et al. DNA sequence and analysis of human chromosome 9. Nature. 2004;429:369–734. [CrossRef] [PubMed]
SuAI, CookeMP, ChingKA, et al. Large-scale analysis of the human and mouse transcriptomes. Proc Natl Acad Sci USA. 2002;99:4465–4470. [CrossRef] [PubMed]
SuAI, WiltshireT, BatalovS, et al. A gene atlas of the mouse and human protein-encoding transcriptomes. Proc Natl Acad Sci USA. 2004;101:6062–6067. [CrossRef] [PubMed]
SnelB, LehmannG, BorkP, HuynenMA. STRING: a web-server to retrieve and display the repeatedly occurring neighbourhood of a gene. Nucleic Acids Res. 2000;28:3442–3444. [CrossRef] [PubMed]
PorterD, WeremowiczS, ChinK, et al. A neural survival factor is a candidate oncogene in breast cancer. Proc Natl Acad Sci USA. 2003;100:10931–10936. [CrossRef] [PubMed]
JiangY, HarlockerSL, MoleshDA, et al. Discovery of differentially expressed genes in human breast cancer using subtracted cDNA libraries and cDNA microarrays. Oncogene. 2002;21:2270–2282. [CrossRef] [PubMed]
BeraTK, LeeS, SalvatoreG, LeeB, PastanI. MRP8, a new member of ABC transporter superfamily, identified by EST database mining and gene prediction program, is highly expressed in breast cancer. Mol Med. 2001;7:509–516. [PubMed]
StrausbergRL, CamargoAA, RigginsGJ, et al. An international database and integrated analysis tools for the study of cancer gene expression. Pharmacogenomics. 2002;2:156–164. [CrossRef]
WhitfieldML, SherlockG, SaldanhaAJ, et al. Identification of genes periodically expressed in the human cell cycle and their expression in tumors. Mol Biol Cell. 2002;13:1977–2000. [CrossRef] [PubMed]
NguyenDH, ToshidaH, SchurrJ, BeuermanRW. Microarray analysis of the rat lacrimal gland following the loss of parasympathetic control of secretion. Physiol Genomics. 2004;18:108–118. [CrossRef] [PubMed]
PavlidisP, NobleWS. Analysis of strain and regional variation in gene expression in mouse brain. Genome Biol. 2001;2:RESEARCH0042. [PubMed]
TodaI, SullivanBD, WickhamLA, SullivanDA. Gender- and androgen-related influence on the expression of proto-oncogene and apoptotic factor mRNAs in lacrimal glands of autoimmune and non-autoimmune mice. J Steroid Biochem Mol Biol. 1999;71:49–61. [CrossRef] [PubMed]
RemingtonSG, NelsonJD. mRNA encoding a new lipolytic enzyme expressed in rabbit lacrimal glands. Invest Ophthalmol Vis Sci. 2002;43:3617–3624. [PubMed]
PavlidisP, LiQ, NobleWS. The effect of replication on gene expression microarray experiments. Bioinformatics. 2003;19:1620–1627. [CrossRef] [PubMed]
WeigeltB, BosmaAJ, van ’t VeerLJ. Expression of a novel lacrimal gland gene lacritin in human breast tissues. J Cancer Res Clin Oncol. 2003;129:735–736. [CrossRef] [PubMed]
ThiesseM, MillarSJ, DickinsonDP. The human type 2 cystatin gene family consists of eight to nine members, with at least seven genes clustered at a single locus on human chromosome 20. DNA Cell Biol. 1994;13:97–116. [CrossRef] [PubMed]
SheX, HorvathJE, JiangZ, et al. The structure and evolution of centromeric transition regions within the human genome. Nature. 2004;430:857–864. [CrossRef] [PubMed]
EichlerEE, SankoffD. Structural dynamics of eukaryotic chromosome evolution. Science. 2003;301:793–797. [CrossRef] [PubMed]
TsaiFY, KellerG, KuoFC, et al. An early haematopoietic defect in mice lacking the transcription factor GATA-2. Nature. 1994;371:221–226. [CrossRef] [PubMed]
SivakJM, West-MaysJA, YeeA, WilliamsT, FiniME. Transcription factors Pax6 and AP-2alpha interact to coordinate corneal epithelial repair by controlling expression of matrix metalloproteinase gelatinase B. Mol Cell Biol. 2004;24:245–257. [CrossRef] [PubMed]
PostigoAA, DeppJL, TaylorJJ, KrollKL. Regulation of Smad signaling through a differential recruitment of coactivators and corepressors by ZEB proteins. EMBO J. 2003;22:2453–2462. [CrossRef] [PubMed]
NishimuraDY, SearbyCC, AlwardWL, et al. A spectrum of FOXC1 mutations suggests gene dosage as a mechanism for developmental defects of the anterior chamber of the eye. Am J Hum Genet. 2001;68:364–372. [CrossRef] [PubMed]
Figure 1.
 
Histology of human male (A) and female (B) lacrimal glands from which cDNAs were derived for sequencing. PolyA+ RNAs from both sexes were combined for cDNA generation. Other than some collagen in the male sample, the glands appeared healthy. Frozen sections stained with eosin.
Figure 1.
 
Histology of human male (A) and female (B) lacrimal glands from which cDNAs were derived for sequencing. PolyA+ RNAs from both sexes were combined for cDNA generation. Other than some collagen in the male sample, the glands appeared healthy. Frozen sections stained with eosin.
Figure 2.
 
Lacrimal glands are packed with polarized acinar cells that release nascent tear proteins into ducts via constitutive or regulated secretory pathways. (A) Representation of predominant human proteins expressed in an acinar cell and small lymphocyte based on abundance of cDNAs sequenced. Other cell types (duct and endothelial cells) contribute some proteins. Many are associated with ribosomes, endoplasmic reticulum (left), and nuclei. Shown are: (1) LYZ, (2) PROL4, (3) LCN1, (4) LTF, (5) PROL1, (6) LACRT, (7) AZGP1, (8) PIP, (9) EEF1A1, (10) SCGB2A1, (11) DDX5, (12) EHF, (13) PROL3, PROL5, (14) LIMO4, (15) TXNIP, (16) PIGR, (17) HSPG2, (18) HNRPA2B1, (19) RFC4, (20) SFRS5, (21) NR4A2, (22) RPL3, (23) CCNL2, (24) XBP1, (25) DSIPI, (26) SUPT5H, (27) SAT, (28) RP3A, (29) SRRM2, (30) RPS6, (31) EPS8L2, (32) NR4A1, (33) EEF1G, (34) CST4, (35) RNP24, (36) PDCD4, (37) FOXC1, (38) RPL17, (39) SREBF1, (40) KLHDC2, (41) RPL13, (42) ANXA5, (43) CD164, (44) AQP5, (45) RPS8, (46) OGT, (47) RPL13a, (48) NFAT5, (49) SCGB1D1, (50) UBC, (51) AARS, (52) ARFGAP3, and (53) RPLPO. (B) Nonredundant GO Slim representation of 7356 human (Hu) and 1052 mouse (Ms) lacrimal cDNAs. Analysis was performed in a nonredundant manner to reflect protein complexity. The two species are functionally very similar, although the most highly expressed genes differ dramatically.
Figure 2.
 
Lacrimal glands are packed with polarized acinar cells that release nascent tear proteins into ducts via constitutive or regulated secretory pathways. (A) Representation of predominant human proteins expressed in an acinar cell and small lymphocyte based on abundance of cDNAs sequenced. Other cell types (duct and endothelial cells) contribute some proteins. Many are associated with ribosomes, endoplasmic reticulum (left), and nuclei. Shown are: (1) LYZ, (2) PROL4, (3) LCN1, (4) LTF, (5) PROL1, (6) LACRT, (7) AZGP1, (8) PIP, (9) EEF1A1, (10) SCGB2A1, (11) DDX5, (12) EHF, (13) PROL3, PROL5, (14) LIMO4, (15) TXNIP, (16) PIGR, (17) HSPG2, (18) HNRPA2B1, (19) RFC4, (20) SFRS5, (21) NR4A2, (22) RPL3, (23) CCNL2, (24) XBP1, (25) DSIPI, (26) SUPT5H, (27) SAT, (28) RP3A, (29) SRRM2, (30) RPS6, (31) EPS8L2, (32) NR4A1, (33) EEF1G, (34) CST4, (35) RNP24, (36) PDCD4, (37) FOXC1, (38) RPL17, (39) SREBF1, (40) KLHDC2, (41) RPL13, (42) ANXA5, (43) CD164, (44) AQP5, (45) RPS8, (46) OGT, (47) RPL13a, (48) NFAT5, (49) SCGB1D1, (50) UBC, (51) AARS, (52) ARFGAP3, and (53) RPLPO. (B) Nonredundant GO Slim representation of 7356 human (Hu) and 1052 mouse (Ms) lacrimal cDNAs. Analysis was performed in a nonredundant manner to reflect protein complexity. The two species are functionally very similar, although the most highly expressed genes differ dramatically.
Figure 3.
 
Eleven most frequent protein domains in human hypothetical proteins, excluding those designated as uncharacterized conserved domains: zinc finger (smart00356, smart00451, KOG1994, KOG2462, KOG2990, and KOG1311); WD40 (cd00200, COG2319, KOG2321, and KOG0271); protein kinase (cd00276, cd00089, pfam00069, and smart00220), Src homology (cd00173, cd00174, and smart00326); RhoGEF (cd00160 and smart00325); O-methyltransferase (COG4122, KOG1661, KOG2915, and pfam01135); amino acid (AA) transporters (KOG1287, KOG1303, and KOG3832); membrane protein (COG5373, KOG3318, and KOG3918); myosin tail (or chain; KOG0161 and pfam01576); pleckstrin homology (smart00233); and tetratricopeptide repeat (cd00189).
Figure 3.
 
Eleven most frequent protein domains in human hypothetical proteins, excluding those designated as uncharacterized conserved domains: zinc finger (smart00356, smart00451, KOG1994, KOG2462, KOG2990, and KOG1311); WD40 (cd00200, COG2319, KOG2321, and KOG0271); protein kinase (cd00276, cd00089, pfam00069, and smart00220), Src homology (cd00173, cd00174, and smart00326); RhoGEF (cd00160 and smart00325); O-methyltransferase (COG4122, KOG1661, KOG2915, and pfam01135); amino acid (AA) transporters (KOG1287, KOG1303, and KOG3832); membrane protein (COG5373, KOG3318, and KOG3918); myosin tail (or chain; KOG0161 and pfam01576); pleckstrin homology (smart00233); and tetratricopeptide repeat (cd00189).
Figure 4.
 
Lacrimal gene activity is unevenly distributed over the human genome and displays clustering suggestive of cotranscriptional regulation. Arrows: location and orientation of lacrimal-preferred genes, defined as those expressed in six or fewer other organs, as determined by manual inspection of UniGene (NCBI). Gene activity is represented by number of cDNA clones per gene and is compared with gene number in 39-Mb bins. Genes in the proximal half of chromosome 12 and distal end of chromosome 9 are particularly active. Not detected but present on proximal chromosome 12 is AAAS, the mutation of which is associated with alacrima.
Figure 4.
 
Lacrimal gene activity is unevenly distributed over the human genome and displays clustering suggestive of cotranscriptional regulation. Arrows: location and orientation of lacrimal-preferred genes, defined as those expressed in six or fewer other organs, as determined by manual inspection of UniGene (NCBI). Gene activity is represented by number of cDNA clones per gene and is compared with gene number in 39-Mb bins. Genes in the proximal half of chromosome 12 and distal end of chromosome 9 are particularly active. Not detected but present on proximal chromosome 12 is AAAS, the mutation of which is associated with alacrima.
Figure 5.
 
Human lacrimal-preferred genes display no overall preference for other organs, whereas mouse orthologs are more widely activated (inset). These observations are in keeping with the possible presence of a human lacrimal enhancer that is poorly conserved in mouse.
Figure 5.
 
Human lacrimal-preferred genes display no overall preference for other organs, whereas mouse orthologs are more widely activated (inset). These observations are in keeping with the possible presence of a human lacrimal enhancer that is poorly conserved in mouse.
Figure 6.
 
More than one half of lacrimal hypothetical proteins display a twofold or greater increase in cancer-associated expression, dependent on cancer type and biopsy. Slightly less than one half decrease in other cancers (not shown). Illustrated are breast and prostate cancer data compiled from the Cancer Genome Anatomy Project SAGE library collection in GEO (NCBI). Only a few (*) lack apparent mouse orthologs.
Figure 6.
 
More than one half of lacrimal hypothetical proteins display a twofold or greater increase in cancer-associated expression, dependent on cancer type and biopsy. Slightly less than one half decrease in other cancers (not shown). Illustrated are breast and prostate cancer data compiled from the Cancer Genome Anatomy Project SAGE library collection in GEO (NCBI). Only a few (*) lack apparent mouse orthologs.
Table 1.
 
Organ Distribution of Hypothetical Proteins
Table 1.
 
Organ Distribution of Hypothetical Proteins
Organ Human % Hypothetical (Total Organ cDNAs)* Mouse % Hypothetical (Total Organ cDNAs)*
Adrenal gland 0.08 (20750) 0.45 , ‡ (4424)
Aorta 0.2 (1137) 1.1 (17396)
Brain 0.6 (394140) 0.09 (403655)
Breast 0.16 (95959) 0.2 (1066)
Cardiac muscle 0.7 (278) 0.04 (2119)
Cervix 0.4 (31149) 0 (0)
Colon 0.7 (154060) 1.0 (24022)
Cord Blood 0 (11406) 0 (0)
Endometrium 1.2 (14378) 0 (98553)
Epididymis 1.1 (175) 1.2 (5280)
Esophagus 1.6 (3946) 0 (1)
Eye 0.1 (253323) 0.07 (98553)
Gallbladder 0 (5) 0.06 (1678)
Head/Neck 0.75 (1596) 0 (0)
Heart 0.7 (52054) 0.5 (70257)
Hypothal 0.4 (35207) 1.0 (19695)
Iris 0 (4316) 0 (0)
Kidney 0.6 (141668) 0.1 (421301)
Lacrimal gland, † 4.5 (5298) 0.7 (2238)
Larynx 0.9 (1156) 0 (1)
Lens 0.002 (45801) 2.2 (1271)
Ligament 0 (3939) 0 (93)
Liver 0.2 (155560) 0.75 (57464)
Lung 0.4 (323698) 0.5 (130725)
Mammary gland 0.2 (3138) 0.9 (138491)
Marrow 0.2 (47962) 0.03 (76292)
Nasopharynx 0.05 (28860) 0 (0)
Nose 0.2 (2613) 0 (14)
Ovary 0.5 (133807) 0.8 (22751)
Oviduct 0 (27) 1.4 (6661)
Pancreas 1.4 (206442) 1.1 (49004)
Parathyroid 0.6 (22967) 0 (200)
Pituitary gland, § 11.8 (51) 0.7 (14780)
Placenta 0.2 (146183) 0.6 (32582)
Prostate 0.2 (126855) 0 (268)
Retina 0.25 (76081) 0.3 (83724)
RPE 0 (52973) 0 (5117)
Salivary gland 0.06 (54316) 0 (1752)
Skeletal muscle 0.45 (33733) 1.5 (4854)
Skin 0.2 (181195) 0.2 (64580)
Small intest 0.3 (15929) 0.7 (12121)
Smooth muscle 0.2 (1863) 0 (1102)
Spinal cord 2.2 (2581) 0.74 (29798)
Spleen 1.2 (9613) 1.1 (442911)
Stomach 0.2 (99069) 0.08 (26228)
Testis 2.8 (60938) 0.4 (152448)
Thyroid gland 0.14 (2119) 1.8 (6)
Whole organism 0.4 (8576334) 0.4 (6673691)
Table 2.
 
EST Alignment to Exons, Introns, or Intergenic Regions for All Lacrimal Gland Hypothetical Proteins
Table 2.
 
EST Alignment to Exons, Introns, or Intergenic Regions for All Lacrimal Gland Hypothetical Proteins
Alignment Human EST Mouse Ortholog Mouse EST Human Ortholog
5′ Exons 11 7 0 0
Middle or 3′ exons 165* 130 8 7
Total exons 176 137 8 7
Introns, possibly exon via alternative splicing* 11 9 0 0
Total introns 16 12 0 0
Intergenic, possibly exon via alternative splicing* 19 12 3 3
Total intergenic 37 20 4 3
Pseudogenes 2 0
Ambiguous 6 0 0
Total hypothetical 237 12
Total orthologs 169 10
×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×