Guidelines for reproducible analysis of adaptive immune receptor repertoire sequencing data

doi:10.1093/bib/bbae221

. 2024 Mar 27;25(3):bbae221.

doi: 10.1093/bib/bbae221.

Guidelines for reproducible analysis of adaptive immune receptor repertoire sequencing data

Ayelet Peres^{1

2}, Vered Klein^{1

2}, Boaz Frankel^{1

2}, William Lees^{3

4}, Pazit Polak^{1

2}, Mark Meehan⁴, Artur Rocha⁴, João Correia Lopes⁴, Gur Yaari^{1

2}

Affiliations

¹ Faculty of Engineering, Bar Ilan University, 5290002 Ramat Gan, Israel.
² Bar Ilan institute of nanotechnology and advanced materials, Bar Ilan university, 5290002 Ramat Gan, Israel.
³ Institute of Structural and Molecular Biology, Birkbeck College, University of London, London, United Kingdom.
⁴ INESC TEC - Institute for Systems and Computer Engineering, Technology and Science Porto, Portugal.

PMID: 38752856
PMCID: PMC11097599
DOI: 10.1093/bib/bbae221

Guidelines for reproducible analysis of adaptive immune receptor repertoire sequencing data

Ayelet Peres et al. Brief Bioinform. 2024.

. 2024 Mar 27;25(3):bbae221.

doi: 10.1093/bib/bbae221.

Authors

Ayelet Peres^{1

2}, Vered Klein^{1

2}, Boaz Frankel^{1

2}, William Lees^{3

4}, Pazit Polak^{1

2}, Mark Meehan⁴, Artur Rocha⁴, João Correia Lopes⁴, Gur Yaari^{1

2}

Affiliations

¹ Faculty of Engineering, Bar Ilan University, 5290002 Ramat Gan, Israel.
² Bar Ilan institute of nanotechnology and advanced materials, Bar Ilan university, 5290002 Ramat Gan, Israel.
³ Institute of Structural and Molecular Biology, Birkbeck College, University of London, London, United Kingdom.
⁴ INESC TEC - Institute for Systems and Computer Engineering, Technology and Science Porto, Portugal.

PMID: 38752856
PMCID: PMC11097599
DOI: 10.1093/bib/bbae221

Abstract

Enhancing the reproducibility and comprehension of adaptive immune receptor repertoire sequencing (AIRR-seq) data analysis is critical for scientific progress. This study presents guidelines for reproducible AIRR-seq data analysis, and a collection of ready-to-use pipelines with comprehensive documentation. To this end, ten common pipelines were implemented using ViaFoundry, a user-friendly interface for pipeline management and automation. This is accompanied by versioned containers, documentation and archiving capabilities. The automation of pre-processing analysis steps and the ability to modify pipeline parameters according to specific research needs are emphasized. AIRR-seq data analysis is highly sensitive to varying parameters and setups; using the guidelines presented here, the ability to reproduce previously published results is demonstrated. This work promotes transparency, reproducibility, and collaboration in AIRR-seq data analysis, serving as a model for handling and documenting bioinformatics pipelines in other research domains.

Keywords: AIRR-seq; FAIR; annotation; pipelines; preprocessing; reproducibility.

PubMed Disclaimer

Figures

**Figure 1**
Steps for reproducible AIRR-seq analysis pipelines. (A) Tweak and run existing pipelines. In step one, an existing pipeline is selected using its Digital Object Identifier (DOI). In step two, the pipeline’s specification and run environment files are downloaded. In step three, the run parameters (e.g., process parameters, primer files, etc.) are adjusted. In step four, AIRR-seq data is obtained from public databases (e.g., ENA, NCBI) or from local storage. In step five, the execution framework is selected, which can be cloud-based (e.g., AWS, Azure, Google) or using ViaFoundry execution framework server or locally run in an automation server platform management (e.g., Jenkins). In step six, the analysis is run in the selected framework. Lastly, the updated pipeline files are downloaded in step seven and documented and archived for future use in steps eight and nine. (B) Create and archive pipelines. In steps one to six, the ViaFoundry framework is used to create the analysis pipeline and set the parameters and run environment. In step seven, the pipeline specification and run environment are obtained. Lastly, the files are documented in a Git repository and archived in Zenodo in steps eight and nine. (C) Create a pipeline with ViaFoundry. The first step is creating processes using the dedicated GUI. The second step is combining different processes into a module. The third step is assembling the full pipeline for analyzing AIRR sequences from a set of modules. This figure was created with BioRender.com

**Figure 2**
A case study of reproducing AIRR-seq analysis results. (A) The influence of a single pipeline parameter on the number of passed reads. Each facet is an independent repertoire, the x-axis corresponds to different error rate thresholds used in the MaskPrimers process, and the y-axis is the number of reads that passed the process given the threshold. Yellow bars correspond to the original threshold used to analyze the repertoires, and blue bars correspond to the alternative thresholds (B) The influence of initial IGHV germline reference set on mutation load. The x-axis corresponds to the different IGHV germline reference set. The yaxis corresponds to the calculated mutation load. (C) IGHV gene mean usage. The x-axis corresponds to the different IGHV genes, and the y-axis corresponds to the mean usage frequency across all control individuals. Green boxes represent the original publication results, and red boxes represent the results obtained by pipeline PP1 listed in Table 1.

See this image and copyright information in PMC

References

1. Wilkinson MD, Dumontier M, Aalbersberg IJJ, et al. The fair guiding principles for scientific data management and stewardship. Scientific data 2016;3(1):1–9. - PMC - PubMed
1. Sandve GK, Nekrutenko A, Taylor J, Hovig E. Ten simple rules for reproducible computational research. PLoS Comput Biol 2013;9(10):e1003285. - PMC - PubMed
1. Peng RD. Reproducible research in computational science. Science 2011;334(6060):1226–7. - PMC - PubMed
1. Wratten L, Wilm A, Göke J. Reproducible, scalable, and shareable analysis pipelines with bioinformatics workflow managers. Nat Methods 2021;18(10):1161–8. - PubMed
1. Köster J, Rahmann S. Snakemake - a scalable bioinformatics workflow engine. Bioinformatics 2012;28(19):2520–2. - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources

[1] Wilkinson MD, Dumontier M, Aalbersberg IJJ, et al. The fair guiding principles for scientific data management and stewardship. Scientific data 2016;3(1):1–9. - PMC - PubMed

[2] Wilkinson MD, Dumontier M, Aalbersberg IJJ, et al. The fair guiding principles for scientific data management and stewardship. Scientific data 2016;3(1):1–9. - PMC - PubMed

[3] Sandve GK, Nekrutenko A, Taylor J, Hovig E. Ten simple rules for reproducible computational research. PLoS Comput Biol 2013;9(10):e1003285. - PMC - PubMed

[4] Sandve GK, Nekrutenko A, Taylor J, Hovig E. Ten simple rules for reproducible computational research. PLoS Comput Biol 2013;9(10):e1003285. - PMC - PubMed

[5] Peng RD. Reproducible research in computational science. Science 2011;334(6060):1226–7. - PMC - PubMed

[6] Peng RD. Reproducible research in computational science. Science 2011;334(6060):1226–7. - PMC - PubMed

[7] Wratten L, Wilm A, Göke J. Reproducible, scalable, and shareable analysis pipelines with bioinformatics workflow managers. Nat Methods 2021;18(10):1161–8. - PubMed

[8] Wratten L, Wilm A, Göke J. Reproducible, scalable, and shareable analysis pipelines with bioinformatics workflow managers. Nat Methods 2021;18(10):1161–8. - PubMed

[9] Köster J, Rahmann S. Snakemake - a scalable bioinformatics workflow engine. Bioinformatics 2012;28(19):2520–2. - PubMed

[10] Köster J, Rahmann S. Snakemake - a scalable bioinformatics workflow engine. Bioinformatics 2012;28(19):2520–2. - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Guidelines for reproducible analysis of adaptive immune receptor repertoire sequencing data

Affiliations

Guidelines for reproducible analysis of adaptive immune receptor repertoire sequencing data

Authors

Affiliations

Abstract

Figures

Similar articles

References

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Abstract

Figures

Similar articles

References

MeSH terms

Related information

Grants and funding

LinkOut - more resources

Full Text Sources