The Dfam database of repetitive DNA families
- PMID: 26612867
- PMCID: PMC4702899
- DOI: 10.1093/nar/gkv1272
The Dfam database of repetitive DNA families
Abstract
Repetitive DNA, especially that due to transposable elements (TEs), makes up a large fraction of many genomes. Dfam is an open access database of families of repetitive DNA elements, in which each family is represented by a multiple sequence alignment and a profile hidden Markov model (HMM). The initial release of Dfam, featured in the 2013 NAR Database Issue, contained 1143 families of repetitive elements found in humans, and was used to produce more than 100 Mb of additional annotation of TE-derived regions in the human genome, with improved speed. Here, we describe recent advances, most notably expansion to 4150 total families including a comprehensive set of known repeat families from four new organisms (mouse, zebrafish, fly and nematode). We describe improvements to coverage, and to our methods for identifying and reducing false annotation. We also describe updates to the website interface. The Dfam website has moved to http://dfam.org. Seed alignments, profile HMMs, hit lists and other underlying data are available for download.
© The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Figures
![Figure 1.](https://www.ncbi.nlm.nih.gov/pmc/articles/instance/4702899/bin/gkv1272fig1.gif)
![Figure 2.](https://www.ncbi.nlm.nih.gov/pmc/articles/instance/4702899/bin/gkv1272fig2.gif)
![Figure 3.](https://www.ncbi.nlm.nih.gov/pmc/articles/instance/4702899/bin/gkv1272fig3.gif)
![Figure 4.](https://www.ncbi.nlm.nih.gov/pmc/articles/instance/4702899/bin/gkv1272fig4.gif)
![Figure 5.](https://www.ncbi.nlm.nih.gov/pmc/articles/instance/4702899/bin/gkv1272fig5.gif)
![Figure 6.](https://www.ncbi.nlm.nih.gov/pmc/articles/instance/4702899/bin/gkv1272fig6.gif)
Similar articles
-
msRepDB: a comprehensive repetitive sequence database of over 80 000 species.Nucleic Acids Res. 2022 Jan 7;50(D1):D236-D245. doi: 10.1093/nar/gkab1089. Nucleic Acids Res. 2022. PMID: 34850956 Free PMC article.
-
Dfam: a database of repetitive DNA based on profile hidden Markov models.Nucleic Acids Res. 2013 Jan;41(Database issue):D70-82. doi: 10.1093/nar/gks1265. Epub 2012 Nov 30. Nucleic Acids Res. 2013. PMID: 23203985 Free PMC article.
-
Repetitive sequences in complex genomes: structure and evolution.Annu Rev Genomics Hum Genet. 2007;8:241-59. doi: 10.1146/annurev.genom.8.080706.092416. Annu Rev Genomics Hum Genet. 2007. PMID: 17506661 Review.
-
Repbase Update, a database of eukaryotic repetitive elements.Cytogenet Genome Res. 2005;110(1-4):462-7. doi: 10.1159/000084979. Cytogenet Genome Res. 2005. PMID: 16093699 Review.
-
Transposable element annotation of the rice genome.Bioinformatics. 2004 Jan 22;20(2):155-60. doi: 10.1093/bioinformatics/bth019. Bioinformatics. 2004. PMID: 14734305
Cited by
-
Genome of the endangered eastern quoll (Dasyurus viverrinus) reveals signatures of historical decline and pelage color evolution.Commun Biol. 2024 May 25;7(1):636. doi: 10.1038/s42003-024-06251-0. Commun Biol. 2024. PMID: 38796620 Free PMC article.
-
Diversity and evolution of transposable elements in the plant-parasitic nematodes.BMC Genomics. 2024 May 23;25(1):511. doi: 10.1186/s12864-024-10435-7. BMC Genomics. 2024. PMID: 38783171 Free PMC article.
-
RNA editing in host lncRNAs as potential modulator in SARS-CoV-2 variants-host immune response dynamics.iScience. 2024 Apr 29;27(6):109846. doi: 10.1016/j.isci.2024.109846. eCollection 2024 Jun 21. iScience. 2024. PMID: 38770134 Free PMC article.
-
Reference-free inferring of transcriptomic events in cancer cells on single-cell data.BMC Cancer. 2024 May 20;24(1):607. doi: 10.1186/s12885-024-12331-5. BMC Cancer. 2024. PMID: 38769480 Free PMC article.
-
SOS1 tonoplast neo-localization and the RGG protein SALTY are important in the extreme salinity tolerance of Salicornia bigelovii.Nat Commun. 2024 May 20;15(1):4279. doi: 10.1038/s41467-024-48595-5. Nat Commun. 2024. PMID: 38769297 Free PMC article.
References
-
- Price A.L., Jones N.C., Pevzner P.A. De novo identification of repeat families in large genomes. 2005;21(Suppl. 1):I351–I358. - PubMed
-
- Krogh A. An Introduction to Hidden Markov Models for Biological Sequences. In: Searls D, Kasif S, editors. Computational Methods in Molecular Biology. Elsevier; 1998. pp. 45–63.
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
Miscellaneous