Pfam 10 years on: 10,000 families and still growing
- PMID: 18344544
- DOI: 10.1093/bib/bbn010
Pfam 10 years on: 10,000 families and still growing
Abstract
Classifications of proteins into groups of related sequences are in some respects like a periodic table for biology, allowing us to understand the underlying molecular biology of any organism. Pfam is a large collection of protein domains and families. Its scientific goal is to provide a complete and accurate classification of protein families and domains. The next release of the database will contain over 10,000 entries, which leads us to reflect on how far we are from completing this work. Currently Pfam matches 72% of known protein sequences, but for proteins with known structure Pfam matches 95%, which we believe represents the likely upper bound. Based on our analysis a further 28,000 families would be required to achieve this level of coverage for the current sequence database. We also show that as more sequences are added to the sequence databases the fraction of sequences that Pfam matches is reduced, suggesting that continued addition of new families is essential to maintain its relevance.
Similar articles
-
The Pfam protein families database: towards a more sustainable future.Nucleic Acids Res. 2016 Jan 4;44(D1):D279-85. doi: 10.1093/nar/gkv1344. Epub 2015 Dec 15. Nucleic Acids Res. 2016. PMID: 26673716 Free PMC article.
-
Pfam: the protein families database.Nucleic Acids Res. 2014 Jan;42(Database issue):D222-30. doi: 10.1093/nar/gkt1223. Epub 2013 Nov 27. Nucleic Acids Res. 2014. PMID: 24288371 Free PMC article.
-
Identifying protein domains with the Pfam database.Curr Protoc Bioinformatics. 2003 May;Chapter 2:Unit 2.5. doi: 10.1002/0471250953.bi0205s01. Curr Protoc Bioinformatics. 2003. PMID: 18428696
-
The limits of protein sequence comparison?Curr Opin Struct Biol. 2005 Jun;15(3):254-60. doi: 10.1016/j.sbi.2005.05.005. Curr Opin Struct Biol. 2005. PMID: 15919194 Free PMC article. Review.
-
The evolution of structural databases.Trends Biotechnol. 2002 Dec;20(12):498-501. doi: 10.1016/s0167-7799(02)02082-6. Trends Biotechnol. 2002. PMID: 12443870 Review.
Cited by
-
Evolution is not Uniform Along Coding Sequences.Mol Biol Evol. 2023 Mar 4;40(3):msad042. doi: 10.1093/molbev/msad042. Mol Biol Evol. 2023. PMID: 36857092 Free PMC article.
-
High Molecular Weight Kininogen: A Review of the Structural Literature.Int J Mol Sci. 2021 Dec 13;22(24):13370. doi: 10.3390/ijms222413370. Int J Mol Sci. 2021. PMID: 34948166 Free PMC article. Review.
-
An Educational Bioinformatics Project to Improve Genome Annotation.Front Microbiol. 2020 Dec 7;11:577497. doi: 10.3389/fmicb.2020.577497. eCollection 2020. Front Microbiol. 2020. PMID: 33365016 Free PMC article.
-
A systems biology approach uncovers a gene co-expression network associated with cell wall degradability in maize.PLoS One. 2019 Dec 31;14(12):e0227011. doi: 10.1371/journal.pone.0227011. eCollection 2019. PLoS One. 2019. PMID: 31891625 Free PMC article.
-
Why do eukaryotic proteins contain more intrinsically disordered regions?PLoS Comput Biol. 2019 Jul 22;15(7):e1007186. doi: 10.1371/journal.pcbi.1007186. eCollection 2019 Jul. PLoS Comput Biol. 2019. PMID: 31329574 Free PMC article.
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources