Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Apr;12(4):357-60.
doi: 10.1038/nmeth.3317. Epub 2015 Mar 9.

HISAT: a fast spliced aligner with low memory requirements

Affiliations

HISAT: a fast spliced aligner with low memory requirements

Daehwan Kim et al. Nat Methods. 2015 Apr.

Abstract

HISAT (hierarchical indexing for spliced alignment of transcripts) is a highly efficient system for aligning reads from RNA sequencing experiments. HISAT uses an indexing scheme based on the Burrows-Wheeler transform and the Ferragina-Manzini (FM) index, employing two types of indexes for alignment: a whole-genome FM index to anchor each alignment and numerous local FM indexes for very rapid extensions of these alignments. HISAT's hierarchical index for the human genome contains 48,000 local FM indexes, each representing a genomic region of ∼64,000 bp. Tests on real and simulated data sets showed that HISAT is the fastest system currently available, with equal or better accuracy than any other method. Despite its large number of indexes, HISAT requires only 4.3 gigabytes of memory. HISAT supports genomes of any size, including those larger than 4 billion bases.

PubMed Disclaimer

Figures

Figure 1
Figure 1
RNA-seq read types and their relative proportions from 20 million simulated 100-bp reads. (a) Five types of RNA-seq reads: (i) M, exonic read; (ii) 2M_gt_15, junction reads with long, >15-bp anchors in both exons; (iii) 2M_8_15, junction reads with intermediate, 8- to 15-bp anchors; (iv) 2M_1_7, junction reads with short, 1- to 7-bp, anchors; and (v) gt_2M, junction reads spanning more than two exons. (b) Relative proportions of different types of reads in the 20 million 100-bp simulated read data.
Figure 2
Figure 2
Alignment speed of spliced alignment software for 20 million simulated 100-bp reads. Alignment speed for all read types (defined in Fig. 1) combined, measured as the number of reads processed per second by the indicated tools. supplementary Figure 2 provides the alignment speed for each type of read separately.
Figure 3
Figure 3
Alignment accuracy of spliced alignment software for 20 million simulated 100-bp reads. Alignment results for all read types (defined in Fig. 1) on simulated data containing errors. Reads are categorized as indicated by the colors. For multimapped reads, an aligner was credited with a correct alignment if it mapped a read to multiple locations and one of those locations was correct. Note that the set of multimapped reads reported by the various aligners may be different, depending on each program’s alignment policy and default behavior. The upper numbers are the percentages corresponding to correctly and uniquely mapped reads. The numbers inside parentheses show percentages for cases correctly and uniquely mapped and correctly multimapped combined. In Supplementary Table 2, we provide detailed percentages on all four categories for each aligner.
Figure 4
Figure 4
Alignment accuracy of spliced-alignment software for reads with small anchors from 20 million simulated reads. This figure shows the alignment sensitivity for reads with small anchors (2M_8_15 and 2M_1_7). Reads are categorized as in Figure 3. The upper numbers on each bar show the percentages corresponding to correctly and uniquely mapped reads. The numbers inside parentheses represent the percentages for cases correctly and uniquely mapped and correctly multimapped combined. There were 1,022,348 and 843,420 reads in 2M_8_15 and 2M_1_7, respectively.

Similar articles

Cited by

References

    1. Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat. Methods. 2008;5:621–628. - PubMed
    1. Trapnell C, et al. Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat. Protoc. 2012;7:562–578. - PMC - PubMed
    1. Affymetrix/Cold Spring Harbor Laboratory ENCODE Transcriptome Project. Post-transcriptional processing generates a diversity of 5′-modified long and short RNAs. Nature. 2009;457:1028–1032. - PMC - PubMed
    1. Cabili MN, et al. Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses. Genes Dev. 2011;25:1915–1927. - PMC - PubMed
    1. Kim D, Salzberg SL. TopHat-Fusion: an algorithm for discovery of novel fusion transcripts. Genome Biol. 2011;12:R72. - PMC - PubMed

Publication types

-