To generate a comprehensive catalog of previously unannotated noncoding transcripts from WT rod and S-cone–like
Nrl−/− photoreceptor RNA-seq libraries, we performed genome guided de novo transcriptome assembly for each sample individually. Short reads in each library were first aligned to
Mus musculus reference genome (GRCm38.p3) through splice-aware aligner, TopHat2 v2.0.11
48 (
Supplementary Table S1). Then, de novo assembly was performed using Cufflinks v2.2.1.
49 Transcript features identified through transcriptome assembly were queried against Ensembl v78 database, and previously unannotated transcripts were determined (see GEO submission GSE 74660 for the GTF file of all unannotated transcripts). Among all putative transcripts, we determined lncRNA and asRNA sequences as follows: (1) we selected only intergenic and antisense transcripts with ≥2 exons, (2) transcripts < 200 nucleotides in length were filtered out, (3) coding potential of each transcript was tested by TransDecoder v1 (available in the public domain at
http://transdecoder.github.io/), and transcripts having open reading frame ≥ 50 were not included in further analysis steps, (4) the remaining transcripts queried against Pfam-A and Pfam-B
50 databases v27.0 with HMMER3 v3.1b1 (available in the public domain at
http://hmmer.janelia.org) to check whether they had any functional protein domain. In this step, E-value threshold was set to 0.05 and transcripts above this threshold were considered as noncoding sequences. To validate our de novo pipe line, we performed quantitative (q) RT-PCR on 10 previously unannotated lncRNA (
Supplementary Fig. S5), and were able to detect all of them in RNA isolated from whole retina. qRTPCR reactions were performed in triplicates, and normalized to endogenous control,
Hprt. The results were analyzed by QuantStudio design and analysis software (Applied Biosystems, Foster City, CA, USA).