Among the abundantly expressed genes identified in the KC data set is a novel gene of unknown function. For convenience, this has been given the temporary designation
KC6, reflecting its discovery as a group of six ESTs from the KC library. A more formal name awaits further characterization. The gene for
KC6 is on human chromosome 18 at q12.3, close to the gene for phosphoinositide-3-kinase, class 3 (
PIK3C3). The only other ESTs in the current version of dbEST that appear to come from the same gene are from embryonic stem cells (accession numbers CD654078 and CN413612). The ESTs from the KC library define a gene with at least six exons, whereas the partial sequence of EST CN413612 suggests the possibility of an additional upstream exon. This gene exhibits alternative splicing and alternative 3′ ends (with alternative polyadenylation signals;
Fig. 2 ) and has all the hallmarks of a polymerase II–dependent gene that encodes an mRNA, with the notable exception that there is no evidence of any significant open reading frame in the transcribed sequences. EST CN413612 from human ES cells, which may represent a partial transcript of the same gene, contains a 5′
Alu-like sequence that contains a short ORF similar to some
Alu-derived sequences in GenBank (not shown; http://www.ncbi.nlm.nih.gov/Genbank; provided in the public domain by the National Center for Biotechnology Information, Bethesda, MD), but overall there is no clear evidence that this novel gene encodes a protein and it may instead belong to the largely mysterious class of noncoding RNAs.
42 The program Repeatmasker (used to delineate repetitive sequences in the genome; http://ftp.genome.washington.edu/cgi-bin/RepeatMasker/ provided in the public domain by the University of Washington Genome Center, Seattle, WA) detects short stretches of LINE-like sequence
43 in some parts of the
KC6 gene, suggesting a possible relationship with retroviral-like sequences. No clear mouse orthologue is apparent, but examination of the mouse genome reveals the presence of four ESTs, also from ES cells, that map to an equivalent region of the mouse genome (on mouse chromosome 18), close to the orthologous
PIK3C3gene, and this region of the mouse genome also contains LINE-related sequences. This is under further investigation. The sequence of the splice variant encompassing all six exons observed in the KC data set has been submitted to GenBank (accession number: AY762618).