Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2008 May;9(3):210-9.
doi: 10.1093/bib/bbn010. Epub 2008 Mar 15.

Pfam 10 years on: 10,000 families and still growing

Affiliations
Review

Pfam 10 years on: 10,000 families and still growing

Stephen John Sammut et al. Brief Bioinform. 2008 May.

Abstract

Classifications of proteins into groups of related sequences are in some respects like a periodic table for biology, allowing us to understand the underlying molecular biology of any organism. Pfam is a large collection of protein domains and families. Its scientific goal is to provide a complete and accurate classification of protein families and domains. The next release of the database will contain over 10,000 entries, which leads us to reflect on how far we are from completing this work. Currently Pfam matches 72% of known protein sequences, but for proteins with known structure Pfam matches 95%, which we believe represents the likely upper bound. Based on our analysis a further 28,000 families would be required to achieve this level of coverage for the current sequence database. We also show that as more sequences are added to the sequence databases the fraction of sequences that Pfam matches is reduced, suggesting that continued addition of new families is essential to maintain its relevance.

PubMed Disclaimer

Similar articles

  • The Pfam protein families database: towards a more sustainable future.
    Finn RD, Coggill P, Eberhardt RY, Eddy SR, Mistry J, Mitchell AL, Potter SC, Punta M, Qureshi M, Sangrador-Vegas A, Salazar GA, Tate J, Bateman A. Finn RD, et al. Nucleic Acids Res. 2016 Jan 4;44(D1):D279-85. doi: 10.1093/nar/gkv1344. Epub 2015 Dec 15. Nucleic Acids Res. 2016. PMID: 26673716 Free PMC article.
  • Pfam: the protein families database.
    Finn RD, Bateman A, Clements J, Coggill P, Eberhardt RY, Eddy SR, Heger A, Hetherington K, Holm L, Mistry J, Sonnhammer EL, Tate J, Punta M. Finn RD, et al. Nucleic Acids Res. 2014 Jan;42(Database issue):D222-30. doi: 10.1093/nar/gkt1223. Epub 2013 Nov 27. Nucleic Acids Res. 2014. PMID: 24288371 Free PMC article.
  • Identifying protein domains with the Pfam database.
    Finn R, Griffiths-Jones S, Bateman A. Finn R, et al. Curr Protoc Bioinformatics. 2003 May;Chapter 2:Unit 2.5. doi: 10.1002/0471250953.bi0205s01. Curr Protoc Bioinformatics. 2003. PMID: 18428696
  • The limits of protein sequence comparison?
    Pearson WR, Sierk ML. Pearson WR, et al. Curr Opin Struct Biol. 2005 Jun;15(3):254-60. doi: 10.1016/j.sbi.2005.05.005. Curr Opin Struct Biol. 2005. PMID: 15919194 Free PMC article. Review.
  • The evolution of structural databases.
    Carugo O, Pongor S. Carugo O, et al. Trends Biotechnol. 2002 Dec;20(12):498-501. doi: 10.1016/s0167-7799(02)02082-6. Trends Biotechnol. 2002. PMID: 12443870 Review.

Cited by

Publication types

MeSH terms

LinkOut - more resources

-