Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Jan 4;44(D1):D279-85.
doi: 10.1093/nar/gkv1344. Epub 2015 Dec 15.

The Pfam protein families database: towards a more sustainable future

Affiliations

The Pfam protein families database: towards a more sustainable future

Robert D Finn et al. Nucleic Acids Res. .

Abstract

In the last two years the Pfam database (http://pfam.xfam.org) has undergone a substantial reorganisation to reduce the effort involved in making a release, thereby permitting more frequent releases. Arguably the most significant of these changes is that Pfam is now primarily based on the UniProtKB reference proteomes, with the counts of matched sequences and species reported on the website restricted to this smaller set. Building families on reference proteomes sequences brings greater stability, which decreases the amount of manual curation required to maintain them. It also reduces the number of sequences displayed on the website, whilst still providing access to many important model organisms. Matches to the full UniProtKB database are, however, still available and Pfam annotations for individual UniProtKB sequences can still be retrieved. Some Pfam entries (1.6%) which have no matches to reference proteomes remain; we are working with UniProt to see if sequences from them can be incorporated into reference proteomes. Pfam-B, the automatically-generated supplement to Pfam, has been removed. The current release (Pfam 29.0) includes 16 295 entries and 559 clans. The facility to view the relationship between families within a clan has been improved by the introduction of a new tool.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Example of the improved representation of relationships graph, indicating the similarity between the Pfam entries within a clan. This particular entry shows the relationship between the entries in the Glutaminase I clan (accession:CL0014). Each entry in the clan is a node in the graph and is represented as circle, with the diameter of the circle being proportional to the number of sequences in the full alignment. Nodes are connected (edges) based on the HHsearch results between the clan members, with the width of edges proportional to the E-value of the HHsearch similarity (E-values ≤ 0.01 are deemed significant). The clanviewer component has been included in the BioJS registry (http://biojs.io/d/clanviewer) and its code is freely available in github (https://github.com/ProteinsWebTeam/clanviewer). In this particular clan, there are three entries (ThuA (PF06283), GATaseI_like (PF07090) and Glyco_hydro_42M (PF08532)) that from a disconnected sub-cluster. DUF4159 (PF13709) is also unconnected to any other entry. However, these entries are included as part of this clan based on the structural similarities to other entries in the clan.

Similar articles

  • UniProt and Mass Spectrometry-Based Proteomics-A 2-Way Working Relationship.
    Bowler-Barnett EH, Fan J, Luo J, Magrane M, Martin MJ, Orchard S; UniProt Consortium. Bowler-Barnett EH, et al. Mol Cell Proteomics. 2023 Aug;22(8):100591. doi: 10.1016/j.mcpro.2023.100591. Epub 2023 Jun 8. Mol Cell Proteomics. 2023. PMID: 37301379 Free PMC article. Review.
  • The Pfam protein families database in 2019.
    El-Gebali S, Mistry J, Bateman A, Eddy SR, Luciani A, Potter SC, Qureshi M, Richardson LJ, Salazar GA, Smart A, Sonnhammer ELL, Hirsh L, Paladin L, Piovesan D, Tosatto SCE, Finn RD. El-Gebali S, et al. Nucleic Acids Res. 2019 Jan 8;47(D1):D427-D432. doi: 10.1093/nar/gky995. Nucleic Acids Res. 2019. PMID: 30357350 Free PMC article.
  • Pfam: the protein families database.
    Finn RD, Bateman A, Clements J, Coggill P, Eberhardt RY, Eddy SR, Heger A, Hetherington K, Holm L, Mistry J, Sonnhammer EL, Tate J, Punta M. Finn RD, et al. Nucleic Acids Res. 2014 Jan;42(Database issue):D222-30. doi: 10.1093/nar/gkt1223. Epub 2013 Nov 27. Nucleic Acids Res. 2014. PMID: 24288371 Free PMC article.
  • Pfam 10 years on: 10,000 families and still growing.
    Sammut SJ, Finn RD, Bateman A. Sammut SJ, et al. Brief Bioinform. 2008 May;9(3):210-9. doi: 10.1093/bib/bbn010. Epub 2008 Mar 15. Brief Bioinform. 2008. PMID: 18344544 Review.
  • The Pfam protein families database.
    Finn RD, Tate J, Mistry J, Coggill PC, Sammut SJ, Hotz HR, Ceric G, Forslund K, Eddy SR, Sonnhammer EL, Bateman A. Finn RD, et al. Nucleic Acids Res. 2008 Jan;36(Database issue):D281-8. doi: 10.1093/nar/gkm960. Epub 2007 Nov 26. Nucleic Acids Res. 2008. PMID: 18039703 Free PMC article.

Cited by

References

    1. Mitchell A., Chang H.-Y., Daugherty L., Fraser M., Hunter S., Lopez R., McAnulla C., McMenamin C., Nuka G., Pesseat S., et al. The InterPro protein families database: the classification resource after 15 years. Nucleic Acids Res. 2015;43:D213–D221. - PMC - PubMed
    1. Punta M., Coggill P.C., Eberhardt R.Y., Mistry J., Tate J., Boursnell C., Pang N., Forslund K., Ceric G., Clements J., et al. The Pfam protein families database. Nucleic Acids Res. 2012;40:D290–D301. - PMC - PubMed
    1. UniProt Consortium. UniProt: a hub for protein information. Nucleic Acids Res. 2015;43:D204–D212. - PMC - PubMed
    1. Eberhardt R.Y., Haft D.H., Punta M., Martin M., O'Donovan C., Bateman A. AntiFam: a tool to help identify spurious ORFs in protein annotation. Database (Oxford) 2012:bas003. - PMC - PubMed
    1. Bateman A., Finn R.D. SCOOP: a simple method for identification of novel protein superfamily relationships. Bioinformatics. 2007;23:809–814. - PMC - PubMed

Publication types

LinkOut - more resources

-