UniRef: comprehensive and non-redundant UniProt reference clusters
- PMID: 17379688
- DOI: 10.1093/bioinformatics/btm098
UniRef: comprehensive and non-redundant UniProt reference clusters
Abstract
Motivation: Redundant protein sequences in biological databases hinder sequence similarity searches and make interpretation of search results difficult. Clustering of protein sequence space based on sequence similarity helps organize all sequences into manageable datasets and reduces sampling bias and overrepresentation of sequences.
Results: The UniRef (UniProt Reference Clusters) provide clustered sets of sequences from the UniProt Knowledgebase (UniProtKB) and selected UniProt Archive records to obtain complete coverage of sequence space at several resolutions while hiding redundant sequences. Currently covering >4 million source sequences, the UniRef100 database combines identical sequences and subfragments from any source organism into a single UniRef entry. UniRef90 and UniRef50 are built by clustering UniRef100 sequences at the 90 or 50% sequence identity levels. UniRef100, UniRef90 and UniRef50 yield a database size reduction of approximately 10, 40 and 70%, respectively, from the source sequence set. The reduced redundancy increases the speed of similarity searches and improves detection of distant relationships. UniRef entries contain summary cluster and membership information, including the sequence of a representative protein, member count and common taxonomy of the cluster, the accession numbers of all the merged entries and links to rich functional annotation in UniProtKB to facilitate biological discovery. UniRef has already been applied to broad research areas ranging from genome annotation to proteomics data analysis.
Availability: UniRef is updated biweekly and is available for online search and retrieval at http://www.uniprot.org, as well as for download at ftp://ftp.uniprot.org/pub/databases/uniprot/uniref.
Supplementary information: Supplementary data are available at Bioinformatics online.
Similar articles
-
UniProt and Mass Spectrometry-Based Proteomics-A 2-Way Working Relationship.Mol Cell Proteomics. 2023 Aug;22(8):100591. doi: 10.1016/j.mcpro.2023.100591. Epub 2023 Jun 8. Mol Cell Proteomics. 2023. PMID: 37301379 Free PMC article. Review.
-
UniProtKB/Swiss-Prot.Methods Mol Biol. 2007;406:89-112. doi: 10.1007/978-1-59745-535-0_4. Methods Mol Biol. 2007. PMID: 18287689
-
In silico characterization of proteins: UniProt, InterPro and Integr8.Mol Biotechnol. 2008 Feb;38(2):165-77. doi: 10.1007/s12033-007-9003-x. Epub 2007 Oct 4. Mol Biotechnol. 2008. PMID: 18219596 Review.
-
The Universal Protein Resource (UniProt): an expanding universe of protein information.Nucleic Acids Res. 2006 Jan 1;34(Database issue):D187-91. doi: 10.1093/nar/gkj161. Nucleic Acids Res. 2006. PMID: 16381842 Free PMC article.
-
UniProt: the Universal Protein knowledgebase.Nucleic Acids Res. 2004 Jan 1;32(Database issue):D115-9. doi: 10.1093/nar/gkh131. Nucleic Acids Res. 2004. PMID: 14681372 Free PMC article.
Cited by
-
SOFB is a comprehensive ensemble deep learning approach for elucidating and characterizing protein-nucleic-acid-binding residues.Commun Biol. 2024 Jun 3;7(1):679. doi: 10.1038/s42003-024-06332-0. Commun Biol. 2024. PMID: 38830995 Free PMC article.
-
The gut microbiota in persistent post-operative pain following breast cancer surgery.Sci Rep. 2024 May 30;14(1):12401. doi: 10.1038/s41598-024-62397-1. Sci Rep. 2024. PMID: 38811609 Free PMC article.
-
Microbial polyphenol metabolism is part of the thawing permafrost carbon cycle.Nat Microbiol. 2024 Jun;9(6):1454-1466. doi: 10.1038/s41564-024-01691-0. Epub 2024 May 28. Nat Microbiol. 2024. PMID: 38806673 Free PMC article.
-
Divergence within the Taxon 'Candidatus Phytoplasma asteris' Confirmed by Comparative Genome Analysis of Carrot Strains.Microorganisms. 2024 May 17;12(5):1016. doi: 10.3390/microorganisms12051016. Microorganisms. 2024. PMID: 38792845 Free PMC article.
-
Spatio-temporal dynamics of the human small intestinal microbiome and its response to a synbiotic.Gut Microbes. 2024 Jan-Dec;16(1):2350173. doi: 10.1080/19490976.2024.2350173. Epub 2024 May 13. Gut Microbes. 2024. PMID: 38738780 Free PMC article. Clinical Trial.
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources