Disregarding multimappers leads to biases in the functional assessment of NGS data
- PMID: 38720252
- PMCID: PMC11078754
- DOI: 10.1186/s12864-024-10344-9
Disregarding multimappers leads to biases in the functional assessment of NGS data
Abstract
Background: Standard ChIP-seq and RNA-seq processing pipelines typically disregard sequencing reads whose origin is ambiguous ("multimappers"). This usual practice has potentially important consequences for the functional interpretation of the data: genomic elements belonging to clusters composed of highly similar members are left unexplored.
Results: In particular, disregarding multimappers leads to the underrepresentation in epigenetic studies of recently active transposable elements, such as AluYa5, L1HS and SVAs. Furthermore, this common strategy also has implications for transcriptomic analysis: members of repetitive gene families, such the ones including major histocompatibility complex (MHC) class I and II genes, are under-quantified.
Conclusion: Revealing inherent biases that permeate routine tasks such as functional enrichment analysis, our results underscore the urgency of broadly adopting multimapper-aware bioinformatic pipelines -currently restricted to specific contexts or communities- to ensure the reliability of genomic and transcriptomic studies.
Keywords: ChIP-seq; Functional analysis; Multimappers; Next-generation sequencing (NGS); RNA-seq.
© 2024. The Author(s).
Conflict of interest statement
The authors declare no competing interests.
Figures
![Fig. 1](https://www.ncbi.nlm.nih.gov/pmc/articles/instance/11080154/bin/12864_2024_10344_Fig1_HTML.gif)
![Fig. 2](https://www.ncbi.nlm.nih.gov/pmc/articles/instance/11080154/bin/12864_2024_10344_Fig2_HTML.gif)
Similar articles
-
Using R and Bioconductor in Clinical Genomics and Transcriptomics.J Mol Diagn. 2020 Jan;22(1):3-20. doi: 10.1016/j.jmoldx.2019.08.006. Epub 2019 Oct 9. J Mol Diagn. 2020. PMID: 31605800 Review.
-
Differential Expression Analysis of RNA-seq Reads: Overview, Taxonomy, and Tools.IEEE/ACM Trans Comput Biol Bioinform. 2020 Mar-Apr;17(2):566-586. doi: 10.1109/TCBB.2018.2873010. Epub 2018 Oct 1. IEEE/ACM Trans Comput Biol Bioinform. 2020. PMID: 30281477 Review.
-
Analysis of ChIP-Seq and RNA-Seq Data with BioWardrobe.Methods Mol Biol. 2018;1783:343-360. doi: 10.1007/978-1-4939-7834-2_17. Methods Mol Biol. 2018. PMID: 29767371 Free PMC article.
-
piPipes: a set of pipelines for piRNA and transposon analysis via small RNA-seq, RNA-seq, degradome- and CAGE-seq, ChIP-seq and genomic DNA sequencing.Bioinformatics. 2015 Feb 15;31(4):593-5. doi: 10.1093/bioinformatics/btu647. Epub 2014 Oct 17. Bioinformatics. 2015. PMID: 25342065 Free PMC article.
-
Grape RNA-Seq analysis pipeline environment.Bioinformatics. 2013 Mar 1;29(5):614-21. doi: 10.1093/bioinformatics/btt016. Epub 2013 Jan 17. Bioinformatics. 2013. PMID: 23329413 Free PMC article.
References
-
- Transcription Factor ChIP-seq Data Standards and Processing Pipeline. https://www.encodeproject.org/chip-seq/transcription_factor/. Accessed 1 Feb 2024.
-
- Sequencing Read Length. https://www.illumina.com/science/technology/next-generation-sequencing/p.... Accessed 1 Feb 2024.
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
Research Materials