Biases in Illumina transcriptome sequencing caused by random hexamer priming

doi:10.1093/nar/gkq224

. 2010 Jul;38(12):e131.

doi: 10.1093/nar/gkq224. Epub 2010 Apr 14.

Biases in Illumina transcriptome sequencing caused by random hexamer priming

Kasper D Hansen¹, Steven E Brenner, Sandrine Dudoit

Affiliations

PMID: 20395217
PMCID: PMC2896536
DOI: 10.1093/nar/gkq224

Biases in Illumina transcriptome sequencing caused by random hexamer priming

Kasper D Hansen et al. Nucleic Acids Res. 2010 Jul.

. 2010 Jul;38(12):e131.

doi: 10.1093/nar/gkq224. Epub 2010 Apr 14.

Authors

Kasper D Hansen¹, Steven E Brenner, Sandrine Dudoit

Affiliation

¹ Division of Biostatistics, School of Public Health, UC Berkeley, 101 Haviland Hall, Berkeley, CA 94720-7358, USA. khansen@stat.berkeley.edu

PMID: 20395217
PMCID: PMC2896536
DOI: 10.1093/nar/gkq224

Abstract

Generation of cDNA using random hexamer priming induces biases in the nucleotide composition at the beginning of transcriptome sequencing reads from the Illumina Genome Analyzer. The bias is independent of organism and laboratory and impacts the uniformity of the reads along the transcriptome. We provide a read count reweighting scheme, based on the nucleotide frequencies of the reads, that mitigates the impact of the bias.

PubMed Disclaimer

Figures

**Figure 1.**
Nucleotide frequencies versus position for stringently mapped reads. For each experiment, mapped reads were extended upstream of the 5′-start position, such that the first position of the actual read is 1 and positions 0 to −20 are obtained from the genome. The first hexamer of the read is shaded. Brief experimental protocols are indicated in the key. **(a)** RNA-Seq experiments conducted using priming with random hexamers, with and without RNA fragmentation. **(b)** DNA resequencing and ChIP-Seq experiments. **(c)** RNA-Seq experiments with alternative library preparation protocols, including priming with random hexamers followed by fragmentation using DNase I and priming with oligo(dT) followed by fragmentation using either DNase I, nebulization or sonication.

**Figure 2.**
Hexamer frequencies. (a) The logarithm (base 2) of all (4096) observed hexamer frequencies computed using positions 1–6 of the aligned reads for an experiment in *H. sapiens* (8) versus an experiment in *S. cerevisiae* (9). The two distributions have a correlation of . (b) As in (a), but the hexamers correspond to positions 25–30 of the aligned reads, with a correlation of .

formula image — **Figure 2.**
Hexamer frequencies. (a) The logarithm (base 2) of all (4096) observed hexamer frequencies computed using positions 1–6 of the aligned reads for an experiment in *H. sapiens* (8) versus an experiment in *S. cerevisiae* (9). The two distributions have a correlation of . (b) As in (a), but the hexamers correspond to positions 25–30 of the aligned reads, with a correlation of .

**Figure 3.**
Nucleotide frequencies versus position for stringently mapped stranded reads for the A nucleotide. (a and b) As in Figure 1a, but split according to whether reads map to the sense or antisense strand. (c) Difference between the frequencies in (a and b).

**Figure 4.**
Evaluation of the reweighting scheme. (a and b) Unadjusted and re-weighted base-level counts for reads from the WT experiment mapped to the sense strand of a 1-kb coding region in *S. cerevisiae* (YOL086C). The graey bars near the x-axis indicate unmappable genomic locations. (c) The goodness-of-fit statistics based on unadjusted and reweighted counts for 552 highly expressed regions of constant expression. (d) Smoothed histograms of the reduction in goodness-of-fit statistics when using the re-weighting scheme, evaluated in five different experiments. Values greater than zero indicate that the re-weighting scheme improves the uniformity of the read distribution.

See this image and copyright information in PMC

Cited by

Sequencing accuracy and systematic errors of nanopore direct RNA sequencing.
Liu-Wei W, van der Toorn W, Bohn P, Hölzer M, Smyth RP, von Kleist M. Liu-Wei W, et al. BMC Genomics. 2024 May 28;25(1):528. doi: 10.1186/s12864-024-10440-w. BMC Genomics. 2024. PMID: 38807060 Free PMC article.
BEERS2: RNA-Seq simulation through high fidelity in silico modeling.
Brooks TG, Lahens NF, Mrčela A, Sarantopoulou D, Nayak S, Naik A, Sengupta S, Choi PS, Grant GR. Brooks TG, et al. Brief Bioinform. 2024 Mar 27;25(3):bbae164. doi: 10.1093/bib/bbae164. Brief Bioinform. 2024. PMID: 38605641 Free PMC article.
Identification of two major QTLs for pod shell thickness in peanut (Arachis hypogaea L.) using BSA-seq analysis.
Liu H, Zheng Z, Sun Z, Qi F, Wang J, Wang M, Dong W, Cui K, Zhao M, Wang X, Zhang M, Wu X, Wu Y, Luo D, Huang B, Zhang Z, Cao G, Zhang X. Liu H, et al. BMC Genomics. 2024 Jan 16;25(1):65. doi: 10.1186/s12864-024-10005-x. BMC Genomics. 2024. PMID: 38229017 Free PMC article.
Differentially expressed platelet activation-related genes in dogs with stage B2 myxomatous mitral valve disease.
Zhou Q, Cui X, Zhou H, Guo S, Wu Z, Li L, Zhang J, Feng W, Guo Y, Ma X, Chen Y, Qiu C, Xu M, Deng G. Zhou Q, et al. BMC Vet Res. 2023 Dec 13;19(1):271. doi: 10.1186/s12917-023-03789-9. BMC Vet Res. 2023. PMID: 38087280 Free PMC article.
A Workflow Guide to RNA-Seq Analysis of Chaperone Function and Beyond.
Holton KM, Giadone RM, Lang BJ, Calderwood SK. Holton KM, et al. Methods Mol Biol. 2023;2693:39-60. doi: 10.1007/978-1-0716-3342-7_4. Methods Mol Biol. 2023. PMID: 37540425

See all "Cited by" articles

References

1. Mortazavi A, Williams B, McCue K, Schaeffer L, Wold B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat. Methods. 2008;5:621–628. - PubMed
1. Dohm JC, Lottaz C, Borodina T, Himmelbauer H. Substantial biases in ultra-short read data sets from high-throughput DNA sequencing. Nucleic Acids Res. 2008;36:e105. - PMC - PubMed
1. Langmead B, Trapnell C, Pop M, Salzberg S. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009;10:R25. - PMC - PubMed
1. Rice P, Longden I, Bleasby A. EMBOSS: The European Molecular Biology Open Software Suite. Trends Genet. 2000;16:276–277. - PubMed
1. Gentleman R, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L, Ge Y, Gentry J, et al. Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 2004;5:R80. - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions

Substances

Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations

[1] Mortazavi A, Williams B, McCue K, Schaeffer L, Wold B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat. Methods. 2008;5:621–628. - PubMed

[2] Mortazavi A, Williams B, McCue K, Schaeffer L, Wold B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat. Methods. 2008;5:621–628. - PubMed

[3] Dohm JC, Lottaz C, Borodina T, Himmelbauer H. Substantial biases in ultra-short read data sets from high-throughput DNA sequencing. Nucleic Acids Res. 2008;36:e105. - PMC - PubMed

[4] Dohm JC, Lottaz C, Borodina T, Himmelbauer H. Substantial biases in ultra-short read data sets from high-throughput DNA sequencing. Nucleic Acids Res. 2008;36:e105. - PMC - PubMed

[5] Langmead B, Trapnell C, Pop M, Salzberg S. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009;10:R25. - PMC - PubMed

[6] Langmead B, Trapnell C, Pop M, Salzberg S. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009;10:R25. - PMC - PubMed

[7] Rice P, Longden I, Bleasby A. EMBOSS: The European Molecular Biology Open Software Suite. Trends Genet. 2000;16:276–277. - PubMed

[8] Rice P, Longden I, Bleasby A. EMBOSS: The European Molecular Biology Open Software Suite. Trends Genet. 2000;16:276–277. - PubMed

[9] Gentleman R, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L, Ge Y, Gentry J, et al. Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 2004;5:R80. - PMC - PubMed

[10] Gentleman R, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L, Ge Y, Gentry J, et al. Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 2004;5:R80. - PMC - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Biases in Illumina transcriptome sequencing caused by random hexamer priming

Affiliation

Biases in Illumina transcriptome sequencing caused by random hexamer priming

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Related information

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources