SEAL: a distributed short read mapping and duplicate removal tool
- PMID: 21697132
- PMCID: PMC3137215
- DOI: 10.1093/bioinformatics/btr325
SEAL: a distributed short read mapping and duplicate removal tool
Abstract
Summary: SEAL is a scalable tool for short read pair mapping and duplicate removal. It computes mappings that are consistent with those produced by BWA and removes duplicates according to the same criteria employed by Picard MarkDuplicates. On a 16-node Hadoop cluster, it is capable of processing about 13 GB per hour in map+rmdup mode, while reaching a throughput of 19 GB per hour in mapping-only mode.
Availability: SEAL is available online at http://biodoop-seal.sourceforge.net/.
Figures
Similar articles
-
Performance optimization in DNA short-read alignment.Bioinformatics. 2022 Apr 12;38(8):2081-2087. doi: 10.1093/bioinformatics/btac066. Bioinformatics. 2022. PMID: 35139149 Free PMC article. Review.
-
Multi-threading the generation of Burrows-Wheeler Alignment.Genet Mol Res. 2016 May 23;15(2). doi: 10.4238/gmr.15028650. Genet Mol Res. 2016. PMID: 27323088
-
MICA: A fast short-read aligner that takes full advantage of Many Integrated Core Architecture (MIC).BMC Bioinformatics. 2015;16 Suppl 7(Suppl 7):S10. doi: 10.1186/1471-2105-16-S7-S10. Epub 2015 Apr 23. BMC Bioinformatics. 2015. PMID: 25952019 Free PMC article.
-
Repetitive DNA and next-generation sequencing: computational challenges and solutions.Nat Rev Genet. 2011 Nov 29;13(1):36-46. doi: 10.1038/nrg3117. Nat Rev Genet. 2011. PMID: 22124482 Free PMC article. Review.
-
The Sequence Alignment/Map format and SAMtools.Bioinformatics. 2009 Aug 15;25(16):2078-9. doi: 10.1093/bioinformatics/btp352. Epub 2009 Jun 8. Bioinformatics. 2009. PMID: 19505943 Free PMC article.
Cited by
-
Bioinformatics characterization of variants of uncertain significance in pediatric sensorineural hearing loss.Front Pediatr. 2024 Feb 21;12:1299341. doi: 10.3389/fped.2024.1299341. eCollection 2024. Front Pediatr. 2024. PMID: 38450295 Free PMC article.
-
Dataset of 143 metagenome-assembled genomes from the Arctic and Atlantic Oceans, including 21 for eukaryotic organisms.Data Brief. 2023 Feb 15;47:108990. doi: 10.1016/j.dib.2023.108990. eCollection 2023 Apr. Data Brief. 2023. PMID: 36879606 Free PMC article.
-
Integrated Analysis of Transcriptome, microRNAs, and Chromatin Accessibility Revealed Potential Early B-Cell Factor1-Regulated Transcriptional Networks during the Early Development of Fetal Brown Adipose Tissues in Rabbits.Cells. 2022 Aug 28;11(17):2675. doi: 10.3390/cells11172675. Cells. 2022. PMID: 36078081 Free PMC article.
-
Metagenome-assembled genomes of phytoplankton microbiomes from the Arctic and Atlantic Oceans.Microbiome. 2022 Apr 28;10(1):67. doi: 10.1186/s40168-022-01254-7. Microbiome. 2022. PMID: 35484634 Free PMC article.
-
Halvade somatic: Somatic variant calling with Apache Spark.Gigascience. 2022 Jan 12;11(1):giab094. doi: 10.1093/gigascience/giab094. Gigascience. 2022. PMID: 35022699 Free PMC article.
References
-
- Dean J., Ghemawat S. OSDI '04: 6th Symposium on Operating Systems Design and Impl. USENIX Association; 2004. MapReduce: simplified data processing on large clusters.
-
- Illumina, Inc. Illumina. 2009. Sequencing Analysis Software User Guide For Pipeline Version 1.4 and CASAVA Version 1.0.
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources