Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2017 May 18;18(1):93.
doi: 10.1186/s13059-017-1213-3.

A comparative evaluation of genome assembly reconciliation tools

Affiliations
Comparative Study

A comparative evaluation of genome assembly reconciliation tools

Hind Alhakami et al. Genome Biol. .

Abstract

Background: The majority of eukaryotic genomes are unfinished due to the algorithmic challenges of assembling them. A variety of assembly and scaffolding tools are available, but it is not always obvious which tool or parameters to use for a specific genome size and complexity. It is, therefore, common practice to produce multiple assemblies using different assemblers and parameters, then select the best one for public release. A more compelling approach would allow one to merge multiple assemblies with the intent of producing a higher quality consensus assembly, which is the objective of assembly reconciliation.

Results: Several assembly reconciliation tools have been proposed in the literature, but their strengths and weaknesses have never been compared on a common dataset. We fill this need with this work, in which we report on an extensive comparative evaluation of several tools. Specifically, we evaluate contiguity, correctness, coverage, and the duplication ratio of the merged assembly compared to the individual assemblies provided as input.

Conclusions: None of the tools we tested consistently improved the quality of the input GAGE and synthetic assemblies. Our experiments show an increase in contiguity in the consensus assembly when the original assemblies already have high quality. In terms of correctness, the quality of the results depends on the specific tool, as well as on the quality and the ranking of the input assemblies. In general, the number of misassemblies ranges from being comparable to the best of the input assembly to being comparable to the worst of the input assembly.

Keywords: Assembly reconciliation; De novo genome assembly; Genomics.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Performance of assembly reconciliation algorithms summarized as points on a 2D scatter plot. The x-axis represents contiguity (NGA50) and the y-axis is the number of misassemblies. In this example, input assembly 1 has fewer assembly errors than assembly 2, but assembly 2 is more contiguous. The output assembly is better than both inputs
Fig. 2
Fig. 2
Contiguity–correctness experimental results. Inputs are contigs (top row) or scaffolds (bottom row). Assembly reconciliation tools are given two assembled genomes to merge (from Homo sapiens, chromosome 14, Rhodobacter sphaeroides, or Staphylococcus aureus), in which the first assembly has high contiguity, the second has high correctness. The tools were run using default parameters
Fig. 3
Fig. 3
Experimental results on merging high-quality assemblies (top row for input contigs and bottom row for input scaffolds). Tools were run using default parameters
Fig. 4
Fig. 4
Experimental results on merging highly fragmented assemblies (top row for input contigs and bottom row for input scaffolds). Tools were run using default parameters
Fig. 5
Fig. 5
Experimental results on merging multiple assemblies of Staphylococcus aureus (black diamonds). The input order was determined using the feature response score (see text for details). Integer labels indicate successive merging steps. Tools were run using default parameters
Fig. 6
Fig. 6
Results for the eight assembly reconciliation tools. They were given as input (1) chromosomes 4 and 15 of the yeast genome and (2) a flawed version of (1) produced by RSVSim containing a deletion in chromosome 4 (top row), an inversion in chromosome 4 (middle row), or a translocation from chromosome 4 to chromosome 15 (bottom row). (1) and (2) are the first two rows in each plot. Decipher was used to detect synteny blocks between the reference and the outputs and to generate synteny plots displayed as gradients. When the reference and output disagree, the gradients are interrupted. Gray regions indicate blocks that do not match the reference

Similar articles

Cited by

References

    1. Eid J, Fehr A, Gray J, Luong K, Lyle J, Otto G, et al. Real-time DNA sequencing from single polymerase molecules. Science. 2009;323(5910):133–8. doi: 10.1126/science.1162986. - DOI - PubMed
    1. Clarke J, Wu HC, Jayasinghe L, Patel A, Reid S, Bayley H. Continuous base identification for single-molecule nanopore DNA sequencing. Nat Nanotechnol. 2009;4(4):265–70. doi: 10.1038/nnano.2009.12. - DOI - PubMed
    1. Salzberg SL, Phillippy AM, Zimin A, Puiu D, Magoc T, Koren S, et al. GAGE: a critical evaluation of genome assemblies and assembly algorithms. Genome Res. 2012;22(3):557–67. doi: 10.1101/gr.131383.111. - DOI - PMC - PubMed
    1. Bradnam KR, Fass JN, Alexandrov A, Baranay P, Bechner M, Birol I, et al. Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species. GigaScience. 2013;2(1):1–31. doi: 10.1186/2047-217X-2-10. - DOI - PMC - PubMed
    1. Soueidan H, Maurier F, Groppi A, Sirand-Pugnet P, Tardy F, Citti C, Dupuy V, Nikolski M. Finishing bacterial genome assemblies with mix. BMC Bioinform. 2013;14(Suppl 15):16. doi: 10.1186/1471-2105-14-S15-S16. - DOI - PMC - PubMed

Publication types

LinkOut - more resources

-