AnchorWave: Sensitive alignment of genomes with high sequence diversity, extensive structural polymorphism, and whole-genome duplication
- PMID: 34934012
- PMCID: PMC8740769
- DOI: 10.1073/pnas.2113075119
AnchorWave: Sensitive alignment of genomes with high sequence diversity, extensive structural polymorphism, and whole-genome duplication
Abstract
Millions of species are currently being sequenced, and their genomes are being compared. Many of them have more complex genomes than model systems and raise novel challenges for genome alignment. Widely used local alignment strategies often produce limited or incongruous results when applied to genomes with dispersed repeats, long indels, and highly diverse sequences. Moreover, alignment using many-to-many or reciprocal best hit approaches conflicts with well-studied patterns between species with different rounds of whole-genome duplication. Here, we introduce Anchored Wavefront alignment (AnchorWave), which performs whole-genome duplication-informed collinear anchor identification between genomes and performs base pair-resolved global alignment for collinear blocks using a two-piece affine gap cost strategy. This strategy enables AnchorWave to precisely identify multikilobase indels generated by transposable element (TE) presence/absence variants (PAVs). When aligning two maize genomes, AnchorWave successfully recalled 87% of previously reported TE PAVs. By contrast, other genome alignment tools showed low power for TE PAV recall. AnchorWave precisely aligns up to three times more of the genome as position matches or indels than the closest competitive approach when comparing diverse genomes. Moreover, AnchorWave recalls transcription factor-binding sites at a rate of 1.05- to 74.85-fold higher than other tools with significantly lower false-positive alignments. AnchorWave complements available genome alignment tools by showing obvious improvement when applied to genomes with dispersed repeats, active TEs, high sequence diversity, and whole-genome duplication variation.
Keywords: genome comparison; regulatory element alignment; sensitive genome alignment; transposable element variation; whole-genome duplication.
Copyright © 2021 the Author(s). Published by PNAS.
Conflict of interest statement
The authors declare no competing interest.
Figures
![Fig. 1.](https://www.ncbi.nlm.nih.gov/pmc/articles/instance/8740769/bin/pnas.2113075119fig01.gif)
![Fig. 2.](https://www.ncbi.nlm.nih.gov/pmc/articles/instance/8740769/bin/pnas.2113075119fig02.gif)
![Fig. 3.](https://www.ncbi.nlm.nih.gov/pmc/articles/instance/8740769/bin/pnas.2113075119fig03.gif)
Similar articles
-
ACMGA: a reference-free multiple-genome alignment pipeline for plant species.BMC Genomics. 2024 May 25;25(1):515. doi: 10.1186/s12864-024-10430-y. BMC Genomics. 2024. PMID: 38796435 Free PMC article.
-
New whole-genome alignment tools are needed for tapping into plant diversity.Trends Plant Sci. 2024 Mar;29(3):355-369. doi: 10.1016/j.tplants.2023.08.013. Epub 2023 Sep 23. Trends Plant Sci. 2024. PMID: 37749022 Review.
-
G-Anchor: a novel approach for whole-genome comparative mapping utilizing evolutionary conserved DNA sequences.Gigascience. 2018 May 1;7(5):giy017. doi: 10.1093/gigascience/giy017. Gigascience. 2018. PMID: 29618053 Free PMC article.
-
Sequence analysis of European maize inbred line F2 provides new insights into molecular and chromosomal characteristics of presence/absence variants.BMC Genomics. 2018 Feb 5;19(1):119. doi: 10.1186/s12864-018-4490-7. BMC Genomics. 2018. PMID: 29402214 Free PMC article.
-
Organization and variability of the maize genome.Curr Opin Plant Biol. 2006 Apr;9(2):157-63. doi: 10.1016/j.pbi.2006.01.009. Epub 2006 Feb 3. Curr Opin Plant Biol. 2006. PMID: 16459130 Review.
Cited by
-
ACMGA: a reference-free multiple-genome alignment pipeline for plant species.BMC Genomics. 2024 May 25;25(1):515. doi: 10.1186/s12864-024-10430-y. BMC Genomics. 2024. PMID: 38796435 Free PMC article.
-
Genetic Causes and Genomic Consequences of Breakdown of Distyly in Linum trigynum.Mol Biol Evol. 2024 May 3;41(5):msae087. doi: 10.1093/molbev/msae087. Mol Biol Evol. 2024. PMID: 38709782 Free PMC article.
-
Computational tools for plant genomics and breeding.Sci China Life Sci. 2024 Apr 23. doi: 10.1007/s11427-024-2578-6. Online ahead of print. Sci China Life Sci. 2024. PMID: 38676814 Review.
-
Phased Assembly of Neo-Sex Chromosomes Reveals Extensive Y Degeneration and Rapid Genome Evolution in Rumex hastatulus.Mol Biol Evol. 2024 Apr 2;41(4):msae074. doi: 10.1093/molbev/msae074. Mol Biol Evol. 2024. PMID: 38606901 Free PMC article.
-
Three near-complete genome assemblies reveal substantial centromere dynamics from diploid to tetraploid in Brachypodium genus.Genome Biol. 2024 Mar 4;25(1):63. doi: 10.1186/s13059-024-03206-w. Genome Biol. 2024. PMID: 38439049 Free PMC article.
References
-
- Exposito-Alonso M., Drost H.-G., Burbano H. A., Weigel D., The Earth BioGenome project: Opportunities and challenges for plant genomics and conservation. Plant J. 102, 222–229 (2020). - PubMed
-
- Lu Z., et al. , The prevalence, evolution and chromatin signatures of plant regulatory elements. Nat. Plants 5, 1250–1259 (2019). - PubMed
-
- Freeling M., Scanlon M. J., Fowler J. E., Fractionation and subfunctionalization following genome duplications: Mechanisms that drive gene content and their consequences. Curr. Opin. Genet. Dev. 35, 110–118 (2015). - PubMed
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Miscellaneous