Purchase this article with an account.
Julius Ngwa, Robert Wojciechowski, Donald J Zack, Terri Beaty, Ingo Ruczinski; Differential Expression Analysis of Gene and Transcript Abundance for Single Cell RNA-Seq Data using STAR and HISAT Aligners.. Invest. Ophthalmol. Vis. Sci. 2017;58(8):1850. doi: https://doi.org/.
Download citation file:
© ARVO (1962-2015); The Authors (2016-present)
Single-cell RNA-Seq is becoming one of the most widely used methods for transcription profiling of individual cells. Currently there are a number of algorithms available for mapping high-throughput RNA-Seq reads against a reference genome, and for quantifying the abundance of gene transcripts. Accurate characterization of these spliced transcripts is critical in determining functionality in normal and disease cells. Our aim is to compare gene/transcript counts obtained from Hierarchical Indexing for Spliced Alignment of Transcript (HISAT2) and Spliced Transcripts Alignment to Reference (STAR) algorithms.
HISAT2 implements a large set of small graph Ferragina-Manzini (FM) indexes, spanning the whole genome to enable rapid and accurate alignment of sequencing reads. STAR aligner consists of a seed searching step and a clustering/stitching/scoring step, and is capable of mapping full-length RNA sequences. We analyzed expression profiles of human and mouse cells from the publicly available Gene Expression Omnibus NCBI database (Series GSE63473). The data entailed highly parallel genome-wide expression profiling from individual cells in mouse retinal tissue obtained by separating them into nanoliter-sized aqueous droplets. We compared the Digital Gene Expression (DGE) matrix from the aligned library, as well per-cell information which indicates the number of genes and transcripts observed.
Some large differences were found in the number of transcripts between STAR and HISAT2 aligners. In particular, the gene counts tended to be higher using HISAT2 compared to STAR. DGE matrices obtained from these aligners showed larger differences in mouse cells compared to human cells.
STAR and HISAT2 aligners provide information on the number of reads that map to a particular genomic position, but lack information about which of the overlapping transcripts they originate from. With the presence of ambiguous reads, uncertainties in counts can result in false differential expression calls of transcripts with similar isoforms within the same gene. Resolving potential fragment assignment ambiguity may be an essential issue to address in RNA-Seq data.
This is an abstract that was submitted for the 2017 ARVO Annual Meeting, held in Baltimore, MD, May 7-11, 2017.
This PDF is available to Subscribers Only