HISAT: a fast spliced aligner with low memory requirements

doi:10.1038/nmeth.3317

. 2015 Apr;12(4):357-60.

doi: 10.1038/nmeth.3317. Epub 2015 Mar 9.

HISAT: a fast spliced aligner with low memory requirements

Daehwan Kim¹, Ben Langmead², Steven L Salzberg²

Affiliations

¹ 1] Center for Computational Biology, McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland, USA. [2] Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, Maryland, USA.
² 1] Center for Computational Biology, McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland, USA. [2] Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, Maryland, USA. [3] Department of Computer Science, Johns Hopkins University, Baltimore, Maryland, USA.

PMID: 25751142
PMCID: PMC4655817
DOI: 10.1038/nmeth.3317

HISAT: a fast spliced aligner with low memory requirements

Daehwan Kim et al. Nat Methods. 2015 Apr.

. 2015 Apr;12(4):357-60.

doi: 10.1038/nmeth.3317. Epub 2015 Mar 9.

Authors

Daehwan Kim¹, Ben Langmead², Steven L Salzberg²

Affiliations

¹ 1] Center for Computational Biology, McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland, USA. [2] Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, Maryland, USA.
² 1] Center for Computational Biology, McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland, USA. [2] Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, Maryland, USA. [3] Department of Computer Science, Johns Hopkins University, Baltimore, Maryland, USA.

PMID: 25751142
PMCID: PMC4655817
DOI: 10.1038/nmeth.3317

Abstract

HISAT (hierarchical indexing for spliced alignment of transcripts) is a highly efficient system for aligning reads from RNA sequencing experiments. HISAT uses an indexing scheme based on the Burrows-Wheeler transform and the Ferragina-Manzini (FM) index, employing two types of indexes for alignment: a whole-genome FM index to anchor each alignment and numerous local FM indexes for very rapid extensions of these alignments. HISAT's hierarchical index for the human genome contains 48,000 local FM indexes, each representing a genomic region of ∼64,000 bp. Tests on real and simulated data sets showed that HISAT is the fastest system currently available, with equal or better accuracy than any other method. Despite its large number of indexes, HISAT requires only 4.3 gigabytes of memory. HISAT supports genomes of any size, including those larger than 4 billion bases.

PubMed Disclaimer

Figures

**Figure 1**
RNA-seq read types and their relative proportions from 20 million simulated 100-bp reads. (a) Five types of RNA-seq reads: (i) M, exonic read; (ii) 2M_gt_15, junction reads with long, >15-bp anchors in both exons; (iii) 2M_8_15, junction reads with intermediate, 8- to 15-bp anchors; (iv) 2M_1_7, junction reads with short, 1- to 7-bp, anchors; and (v) gt_2M, junction reads spanning more than two exons. (b) Relative proportions of different types of reads in the 20 million 100-bp simulated read data.

**Figure 2**
Alignment speed of spliced alignment software for 20 million simulated 100-bp reads. Alignment speed for all read types (defined in Fig. 1) combined, measured as the number of reads processed per second by the indicated tools. supplementary Figure 2 provides the alignment speed for each type of read separately.

**Figure 3**
Alignment accuracy of spliced alignment software for 20 million simulated 100-bp reads. Alignment results for all read types (defined in Fig. 1) on simulated data containing errors. Reads are categorized as indicated by the colors. For multimapped reads, an aligner was credited with a correct alignment if it mapped a read to multiple locations and one of those locations was correct. Note that the set of multimapped reads reported by the various aligners may be different, depending on each program’s alignment policy and default behavior. The upper numbers are the percentages corresponding to correctly and uniquely mapped reads. The numbers inside parentheses show percentages for cases correctly and uniquely mapped and correctly multimapped combined. In Supplementary Table 2, we provide detailed percentages on all four categories for each aligner.

**Figure 4**
Alignment accuracy of spliced-alignment software for reads with small anchors from 20 million simulated reads. This figure shows the alignment sensitivity for reads with small anchors (2M_8_15 and 2M_1_7). Reads are categorized as in Figure 3. The upper numbers on each bar show the percentages corresponding to correctly and uniquely mapped reads. The numbers inside parentheses represent the percentages for cases correctly and uniquely mapped and correctly multimapped combined. There were 1,022,348 and 843,420 reads in 2M_8_15 and 2M_1_7, respectively.

See this image and copyright information in PMC

Cited by

CmERF1 acts as a positive regulator of fruits and leaves growth in melon (Cucumis melo L.).
Sun Y, Yang H, Ren T, Zhao J, Lang X, Nie L, Zhao W. Sun Y, et al. Plant Mol Biol. 2024 Jun 6;114(3):70. doi: 10.1007/s11103-024-01468-3. Plant Mol Biol. 2024. PMID: 38842600
Multiomics analysis of platelet-rich plasma promoting biological performance of mesenchymal stem cells.
Dai P, Wu Y, Gao Y, Li M, Zhu M, Xu H, Feng X, Jin Y, Zhang X. Dai P, et al. BMC Genomics. 2024 Jun 5;25(1):564. doi: 10.1186/s12864-024-10329-8. BMC Genomics. 2024. PMID: 38840037 Free PMC article.
Senescent glia link mitochondrial dysfunction and lipid accumulation.
Byrns CN, Perlegos AE, Miller KN, Jin Z, Carranza FR, Manchandra P, Beveridge CH, Randolph CE, Chaluvadi VS, Zhang SL, Srinivasan AR, Bennett FC, Sehgal A, Adams PD, Chopra G, Bonini NM. Byrns CN, et al. Nature. 2024 Jun;630(8016):475-483. doi: 10.1038/s41586-024-07516-8. Epub 2024 Jun 5. Nature. 2024. PMID: 38839958 Free PMC article.
A single-cell transcriptome atlas of human euploid and aneuploid blastocysts.
Wang S, Leng L, Wang Q, Gu Y, Li J, An Y, Deng Q, Xie P, Cheng C, Chen X, Zhou Q, Lu J, Chen F, Liu L, Yang H, Wang J, Xu X, Hou Y, Gong F, Hu L, Lu G, Shang Z, Lin G. Wang S, et al. Nat Genet. 2024 Jun 5. doi: 10.1038/s41588-024-01788-6. Online ahead of print. Nat Genet. 2024. PMID: 38839885
Chromosome-level genome assembly of the snakefly Mongoloraphidia duomilia (Raphidioptera: Raphidiidae).
Shen R, Sylvester T, Shin NR, Zhan Z, Jin J, Yang D, McKenna DD, Liu X. Shen R, et al. Sci Data. 2024 Jun 4;11(1):579. doi: 10.1038/s41597-024-03439-1. Sci Data. 2024. PMID: 38834590 Free PMC article.

See all "Cited by" articles

References

1. Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat. Methods. 2008;5:621–628. - PubMed
1. Trapnell C, et al. Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat. Protoc. 2012;7:562–578. - PMC - PubMed
1. Affymetrix/Cold Spring Harbor Laboratory ENCODE Transcriptome Project. Post-transcriptional processing generates a diversity of 5′-modified long and short RNAs. Nature. 2009;457:1028–1032. - PMC - PubMed
1. Cabili MN, et al. Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses. Genes Dev. 2011;25:1915–1927. - PMC - PubMed
1. Kim D, Salzberg SL. TopHat-Fusion: an algorithm for discovery of novel fusion transcripts. Genome Biol. 2011;12:R72. - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations
- scite Smart Citations

[1] Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat. Methods. 2008;5:621–628. - PubMed

[2] Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat. Methods. 2008;5:621–628. - PubMed

[3] Trapnell C, et al. Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat. Protoc. 2012;7:562–578. - PMC - PubMed

[4] Trapnell C, et al. Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat. Protoc. 2012;7:562–578. - PMC - PubMed

[5] Affymetrix/Cold Spring Harbor Laboratory ENCODE Transcriptome Project. Post-transcriptional processing generates a diversity of 5′-modified long and short RNAs. Nature. 2009;457:1028–1032. - PMC - PubMed

[6] Affymetrix/Cold Spring Harbor Laboratory ENCODE Transcriptome Project. Post-transcriptional processing generates a diversity of 5′-modified long and short RNAs. Nature. 2009;457:1028–1032. - PMC - PubMed

[7] Cabili MN, et al. Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses. Genes Dev. 2011;25:1915–1927. - PMC - PubMed

[8] Cabili MN, et al. Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses. Genes Dev. 2011;25:1915–1927. - PMC - PubMed

[9] Kim D, Salzberg SL. TopHat-Fusion: an algorithm for discovery of novel fusion transcripts. Genome Biol. 2011;12:R72. - PMC - PubMed

[10] Kim D, Salzberg SL. TopHat-Fusion: an algorithm for discovery of novel fusion transcripts. Genome Biol. 2011;12:R72. - PMC - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

HISAT: a fast spliced aligner with low memory requirements

Affiliations

HISAT: a fast spliced aligner with low memory requirements

Authors

Affiliations

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Related information

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources