The first criterion we used for selection of genes to be included in the human retinal cDNA microarray was expression pattern—specifically, representation in a human retina and/or retinal pigment epithelium (RPE) cDNA library. The initial gene list was generated using EST databases from NCBI (http://www.ncb.nlm.nih.gov), The Institute of Genomic Research (TIGR; http://www.tigr.org/ Rockville, MD), and sequences from a human RPE cDNA library that was constructed in our laboratory (Chang et al., unpublished results, 1996). A second and complementary criterion was gene function. Included were groups of genes involved in phototransduction, visual cycle, photoreceptor structure, and retinal and neuronal development. Genes from other functional classes such as receptors, signal transduction molecules, cell adhesion molecules, and transcription factors were also included, even if their expression in the retina and/or RPE had not been previously demonstrated. Last, genes known or suspected to be involved in retinal diseases (retinal and macular degeneration, glaucoma, retinal neovascularization), and genes related to general pathologic processes (inflammation, apoptosis, ischemia) were also included. The databases that were used to identify these genes included the NCBI, TIGR, SOURCE (http://genome-www5.stanford.edu/cgi-bin/SMD/source/ hosted in the public domain by Stanford University, Stanford, CA) and Ret Net (http://www.sph.uth.tmc.edu/Retnet/ hosted by University of Texas Houston Health Science Center, Houston, TX). The resultant gene list was made nonredundant based on each gene’s Unigene cluster. ESTs that were not assigned to any Unigene cluster were blasted against the nr database (http://www.ncbi.nlm.nih.gov/Blast; NCBI) and were included on the array if no other identical sequence from a different EST was on the array. The final list included approximately 12,000 genes and ESTs.
We were able to obtain plasmids representing 10,034 of the sequences on our master list. Of these, 67% are from known genes and 33% are from ESTs. About half of the known genes on the array have been characterized. Their encoded proteins are involved in variety of biological
(Fig. 1A) and molecular
(Fig. 1B) processes.
After generating and purifying PCR products representing the 10,034 cDNAs, we tested various conditions for spotting and hybridization, and obtained stronger and more consistent signals using 50% DMSO compared with aqueous spotting buffers (data not shown). cDNA labeling methods were also optimized and compared. We found that the indirect method (with aminoallyl-UTP), compared with direct dye incorporation, was superior in terms of amount of dye incorporated per probe as well as the ratio of labeled/unlabeled nucleotide in the probe, and yielded higher signal-to-noise ratios on hybridization (data not shown). In addition, indirect labeling yielded similar incorporation of Cy3 and Cy5, whereas with the direct method Cy5 incorporation was significantly less than that of Cy3.
To assess overall performance and reliability of our microarray analyses, we performed self–self hybridizations. Forty micrograms of reference sample total RNA was divided into two aliquots: Half was labeled with Cy3, and the other half was labeled with Cy5, and then the two were mixed and hybridized together
(Fig. 2) . A high degree of correlation was achieved between the two channels (correlation coefficient,
R 2 = 0.9432), demonstrating the reproducibility of the labeling, hybridization, and image analysis processes. However, it should be noted that a small fraction of the genes artifactually appeared to be differentially expressed: 21 of the 10,034 sequences showed twofold expression difference in the hybridization presented in
Figure 2 . This is a common finding in microarray studies that underscores the importance of performing replicate arrays with dye swapping for each sample. The signal-to-background ratios varied across the spots on each slide and across arrays, with average signal-to-background ratios of the Cy3 and Cy5 channels on different slides varying from 2 ± 1.9 to 9.8 ± 6.7, and the percentage of spots with signal-to-background ratios higher than 1.4-fold varied from 36% to 85%.