Reproducible, scalable, and shareable analysis pipelines with bioinformatics workflow managers
- PMID: 34556866
- DOI: 10.1038/s41592-021-01254-9
Reproducible, scalable, and shareable analysis pipelines with bioinformatics workflow managers
Abstract
The rapid growth of high-throughput technologies has transformed biomedical research. With the increasing amount and complexity of data, scalability and reproducibility have become essential not just for experiments, but also for computational analysis. However, transforming data into information involves running a large number of tools, optimizing parameters, and integrating dynamically changing reference data. Workflow managers were developed in response to such challenges. They simplify pipeline development, optimize resource usage, handle software installation and versions, and run on different compute platforms, enabling workflow portability and sharing. In this Perspective, we highlight key features of workflow managers, compare commonly used approaches for bioinformatics workflows, and provide a guide for computational and noncomputational users. We outline community-curated pipeline initiatives that enable novice and experienced users to perform complex, best-practice analyses without having to manually assemble workflows. In sum, we illustrate how workflow managers contribute to making computational analysis in biomedical research shareable, scalable, and reproducible.
© 2021. Springer Nature America, Inc.
Similar articles
-
DolphinNext: a distributed data processing platform for high throughput genomics.BMC Genomics. 2020 Apr 19;21(1):310. doi: 10.1186/s12864-020-6714-x. BMC Genomics. 2020. PMID: 32306927 Free PMC article.
-
Scalable Data Analysis in Proteomics and Metabolomics Using BioContainers and Workflows Engines.Proteomics. 2020 May;20(9):e1900147. doi: 10.1002/pmic.201900147. Epub 2019 Dec 18. Proteomics. 2020. PMID: 31657527 Free PMC article. Review.
-
Scalable Workflows and Reproducible Data Analysis for Genomics.Methods Mol Biol. 2019;1910:723-745. doi: 10.1007/978-1-4939-9074-0_24. Methods Mol Biol. 2019. PMID: 31278683 Free PMC article.
-
Accumulating computational resource usage of genomic data analysis workflow to optimize cloud computing instance selection.Gigascience. 2019 Apr 1;8(4):giz052. doi: 10.1093/gigascience/giz052. Gigascience. 2019. PMID: 31222199 Free PMC article.
-
A bioinformatics workflow to decipher transcriptomic data from vitamin D studies.J Steroid Biochem Mol Biol. 2019 May;189:28-35. doi: 10.1016/j.jsbmb.2019.01.003. Epub 2019 Feb 1. J Steroid Biochem Mol Biol. 2019. PMID: 30716464 Review.
Cited by
-
Guidelines for reproducible analysis of adaptive immune receptor repertoire sequencing data.Brief Bioinform. 2024 Mar 27;25(3):bbae221. doi: 10.1093/bib/bbae221. Brief Bioinform. 2024. PMID: 38752856 Free PMC article.
-
Cloud-based large-scale curation of medical imaging data using AI segmentation.Res Sq [Preprint]. 2024 May 3:rs.3.rs-4351526. doi: 10.21203/rs.3.rs-4351526/v1. Res Sq. 2024. PMID: 38746269 Free PMC article. Preprint.
-
A seven-step guide to spatial, agent-based modelling of tumour evolution.Evol Appl. 2024 May 2;17(5):e13687. doi: 10.1111/eva.13687. eCollection 2024 May. Evol Appl. 2024. PMID: 38707992 Free PMC article. Review.
-
BTR: a bioinformatics tool recommendation system.Bioinformatics. 2024 May 2;40(5):btae275. doi: 10.1093/bioinformatics/btae275. Bioinformatics. 2024. PMID: 38662583 Free PMC article.
-
Packaging and containerization of computational methods.Nat Protoc. 2024 Apr 2. doi: 10.1038/s41596-024-00986-0. Online ahead of print. Nat Protoc. 2024. PMID: 38565959 Review.
References
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources