Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Dec 1;21(23):9161.
doi: 10.3390/ijms21239161.

Benchmarking Long-Read Assemblers for Genomic Analyses of Bacterial Pathogens Using Oxford Nanopore Sequencing

Affiliations

Benchmarking Long-Read Assemblers for Genomic Analyses of Bacterial Pathogens Using Oxford Nanopore Sequencing

Zhao Chen et al. Int J Mol Sci. .

Abstract

Oxford Nanopore sequencing can be used to achieve complete bacterial genomes. However, the error rates of Oxford Nanopore long reads are greater compared to Illumina short reads. Long-read assemblers using a variety of assembly algorithms have been developed to overcome this deficiency, which have not been benchmarked for genomic analyses of bacterial pathogens using Oxford Nanopore long reads. In this study, long-read assemblers, namely Canu, Flye, Miniasm/Racon, Raven, Redbean, and Shasta, were thus benchmarked using Oxford Nanopore long reads of bacterial pathogens. Ten species were tested for mediocre- and low-quality simulated reads, and 10 species were tested for real reads. Raven was the most robust assembler, obtaining complete and accurate genomes. All Miniasm/Racon and Raven assemblies of mediocre-quality reads provided accurate antimicrobial resistance (AMR) profiles, while the Raven assembly of Klebsiella variicola with low-quality reads was the only assembly with an accurate AMR profile among all assemblers and species. All assemblers functioned well for predicting virulence genes using mediocre-quality and real reads, whereas only the Raven assemblies of low-quality reads had accurate numbers of virulence genes. Regarding multilocus sequence typing (MLST), Miniasm/Racon was the most effective assembler for mediocre-quality reads, while only the Raven assemblies of Escherichia coli O157:H7 and K. variicola with low-quality reads showed positive MLST results. Miniasm/Racon and Raven were the best performers for MLST using real reads. The Miniasm/Racon and Raven assemblies showed accurate phylogenetic inference. For the pan-genome analyses, Raven was the strongest assembler for simulated reads, whereas Miniasm/Racon and Raven performed the best for real reads. Overall, the most robust and accurate assembler was Raven, closely followed by Miniasm/Racon.

Keywords: Oxford Nanopore sequencing; bacterial pathogen; benchmarking; genome assembly; genomic analysis; long-read assembler; long-read sequencing; whole-genome sequencing.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest. The funder had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Figures

Figure 1
Figure 1
Whole-genome phylogenetic tree of Oxford Nanopore long-read assemblies of Listeria monocytogenes EGD-e with mediocre- and low-quality reads using different long-read assemblers, as aligned to 30 distantly related L. monocytogenes strains selected based on the single-nucleotide polymorphisms (SNPs) strategy (number of SNPs > 500) and compared with the reference genome.
Figure 2
Figure 2
Whole-genome phylogenetic tree of Oxford Nanopore long-read assemblies of Listeria monocytogenes EGD-e with mediocre- and low-quality reads using different long-read assemblers, as aligned to 30 distantly related L. monocytogenes strains selected based on the single-nucleotide polymorphisms (SNPs) strategy (number of SNPs > 500) and 30 closely related L. monocytogenes strains selected based on the SNP (number of SNPs < 500) and core-genome multilocus sequence typing (cgMLST) (different alleles < 500) strategies and compared with the reference genome.
Figure 3
Figure 3
Whole-genome phylogenetic tree of Oxford Nanopore long-read assemblies of Salmonella Bareilly CFSAN000189 with real reads using different long-read assemblers, as aligned to 30 closely related S. Bareilly strains selected based on the single-nucleotide polymorphisms (SNPs) (number of SNPs < 500) and whole-genome multilocus sequence typing (wgMLST) strategies (different alleles < 500) and compared with the reference genome.
Figure 4
Figure 4
Whole-genome phylogenetic tree of Oxford Nanopore long-read assemblies of Campylobacter jejuni NCTC 11168 with real reads using different long-read assemblers, as aligned to 11 closely related C. jejuni strains selected based on the single-nucleotide polymorphisms (SNPs) strategy (number of SNPs < 500) and compared with the reference genome.
Figure 5
Figure 5
Whole-genome phylogenetic tree of Oxford nanopore long-read assemblies of Campylobacter jejuni NCTC 11168 with real reads using different long-read assemblers, as aligned to 11 closely related (number of SNPs < 500) and 20 distantly related C. jejuni strains (number of SNPs > 500) selected based on the single-nucleotide polymorphisms (SNPs) strategy and compared with the reference genome.
Figure 6
Figure 6
Whole-genome phylogenetic tree of Oxford nanopore long-read assemblies of Campylobacter jejuni NCTC 11168 with real reads using different long-read assemblers, as aligned to 11 closely related (number of SNPs < 500), 20 distantly related C. jejuni strains (number of SNPs > 500) selected based on the single-nucleotide polymorphisms (SNPs) strategy, and 20 Campylobacter strains of other species and compared with the reference genome.
Figure 7
Figure 7
Pan genomes of Oxford Nanopore long-read assemblies of Pseudomonas aeruginosa PAO1 with mediocre- (a) and low-quality (b) reads using different long-read assemblers and 20 distantly related P. aeruginosa strains selected based on the single-nucleotide polymorphisms (SNPs) strategy (number of SNPs > 500) and compared with the reference genome.
Figure 8
Figure 8
Pan genomes of Oxford Nanopore long-read assemblies of Escherichia coli O157:H7 CFSAN076619 with real reads using different long-read assemblers and six closely related E. coli O157:H7 strains selected based on the whole-genome multilocus sequence typing (wgMLST) strategy (different alleles < 500) and compared with the reference genome.

Similar articles

Cited by

References

    1. Bertelli C., Greub G. Rapid bacterial genome sequencing: Methods and applications in clinical microbiology. Clin. Microbiol. Infect. 2013;19:803–813. doi: 10.1111/1469-0691.12217. - DOI - PubMed
    1. Heydari M., Miclotte G., Demeester P., Van de Peer Y., Fostier J. Evaluation of the impact of Illumina error correction tools on de novo genome assembly. BMC Bioinform. 2017;18:374. doi: 10.1186/s12859-017-1784-8. - DOI - PMC - PubMed
    1. Koren S., Phillippy A.M. One chromosome, one contig: Complete microbial genomes from long-read sequencing and assembly. Curr. Opin. Microbiol. 2015;23:110–120. doi: 10.1016/j.mib.2014.11.014. - DOI - PubMed
    1. De Maio N., Shaw L.P., Hubbard A., George S., Sanderson N.D., Swann J., Wick R., AbuOun M., Stubberfield E., Hoosdally S.J., et al. Comparison of long-read sequencing technologies in the hybrid assembly of complex bacterial genomes. Microb. Genom. 2019;5:e000294. doi: 10.1099/mgen.0.000294. - DOI - PMC - PubMed
    1. Alkan C., Sajjadian S., Eichler E.E. Limitations of next-generation genome sequence assembly. Nat. Methods. 2011;8:61. doi: 10.1038/nmeth.1527. - DOI - PMC - PubMed

Substances

LinkOut - more resources

-