Setting up an account

To set up an account, you have to provide contact details. The person providing these details, the contact person, is the person who we will contact if we have any questions about the submission, and the person to whom the accession number will be sent. The contact person does not need to be the same as the contact person of a possible publication. The contact details will not be added to the UniProt Knowledgebase entry.

The following fields are mandatory:

  • First name
  • Surname
  • Email address
  • Password (and Password confirmation)
  • Address
  • Country
  • Telephone - please include the country code

If you don't get an e-mail just after entering your personal details it is likely that the e-mail address you provided is incorrect. Any error with the e-mail address might lead to delays in releasing accession numbers.

Submitting a sequence

Click Create a new submission to start the submission process. This opens a web form and you will be asked to provide the following information:

Required:

  • Protein name
  • Sequencing method
  • Organism
  • Sequence
  • Citations
  • Confidentiality

Optional:

  • Properties of the protein
  • Sequence features of the protein

You will be expected to complete all required fields.

Required information

Protein name

Please give the name(s) of the protein.

Examples:

  • Neurotoxin 4 (Tf4).
  • Unknown protein from 2D-page of liver tissue.
  • Aspartylglucosaminidase alpha subunit.

Sequencing method

Please select the method you used to determine the sequence you are about to submit. If none of the provided options fits your experimental approach please select the option 'Other' and provide a short free-text description.

Organism

  • Scientific name

    The scientific name of the organism from which the sequence was obtained.

    Example: Homo sapiens

  • Common name

    The commonly used name(s) of the organism from which the sequence was obtained.

    Example: Human

  • Taxonomy ID

    The NCBI Taxonomy ID of the organism from which the sequence was obtained.

  • Strain

    The strain, cultivar or variety from which the sequence was obtained.

    Examples:

    • ATCC 10832
    • cv. Granny Smith
  • Tissue

    The tissue or cell type the protein was purified from. You may also include developmental stage information.

    Examples:

    • Venom
    • Erythrocyte
    • Embryonic liver

Citations

You should ideally submit your sequence data before you have galley proofs. However, if your manuscript has already been accepted for publication, the accession number can be included at the galley proof stage as a note added in proof.

We suggest that the following text be used to cite the accession number(s) in publication(s): "The protein sequence data reported in this paper will appear in the UniProt Knowledgebase under the accession number(s) xxxxxx."

You must choose between the following citation types:

  • Unpublished

    If you do not plan to publish the sequence elsewhere, please provide a list of author names (a minimum of one name is required) and a title for the submission. The title should be similar to one you would use if you published the sequence in a journal.

  • Published Journal Article

    Please give the title, author list, journal name, volume, pages and the year of publication. If the Pubmed ID is known please also provide this.

  • Unpublished Journal Article

    Unpublished journal articles have the same fields as a published journal article, with the exception of publication year and Pubmed ID. Fill in as much information as you currently have available. Mandatory field restrictions are relaxed for this citation type.

  • Thesis

    Please give the title of the thesis, the author, year (if known), institution name and the country.

Sequence

Please select whether you are submitting a complete sequence or one or more fragments of a particular protein. If you are submitting fragments you can use a checkbox to indicate whether the first fragment is the N-terminal one and whether the order of fragments is known or not. Enter the sequence in the box provided using single letter amino acid code.

Confidentiality

Depending on your preferences, your data can either be released immediately after being checked by a curator, or kept confidential initially. Please note that the data can not be kept confidential once published in a journal or thesis. There are three options for release date.

  • Data may be published without further notice

    The entry will be released after you have been provided with the accession number(s). Please note that the entry will not appear in the database immediately after we send you the accession number.

  • Data must be kept confidential until publication

    The entry will be released when it is published in a journal. Please contact us when your paper is published, giving full citation details, to ensure that we release your entry at the appropriate time.

  • Data may be made public after the specified date

    The entry will be released on or after the date you provided. If you publish your data before the date you gave us, please let us know and we can make sure that the submitted data is released at the appropriate time. You may change the provided date by contacting data submissions by e-mail.

Optional information

Protein properties

The information you provide will be annotated by a scientific curator before it is entered in the database, so don't worry if you are unsure which category to put the data in. You may for example use the Other Information category for the information that doesn't fit anywhere else.

  • Mass spectrometry

    Please give the result and accuracy in Daltons, and choose the method you used from the drop-down menu. Please let us know in the comment field if the result is not of the whole protein, but a fragment. If you give several results for one peptide, please specify what causes the difference in the mass. Only experimental data may be entered here.

    Examples:

    • Result: 10565, Accuracy: 2, Method: Electrospray
    • Result: 4378.90, Method: MALDI, Note: Phosphorylated form
  • Function

    What does the protein do?

    Examples:

    • Has antibacterial activity against Gram- positive bacterium M.luteus
    • Conjugation of reduced glutathione to a wide number of exogenous and endogenous hydrophobic electrophiles. Has a strong specific activity toward 1-chloro-2,4- dinitrobenzene and etharynic acid
    • Binds heavy metals. May function as a carrier of divalent cations in plasma
  • Tissue specificity

    Is this protein specifically found in certain tissues?

    Examples:

    • Fruits
    • Expressed strongly in roots, weakly in stems. Not found in leaves
    • Expressed by the venom gland
    • Widely distributed, highest concentrations found in brain, brain cortex and kidney
    • Stored in hemocyte granules and secreted into the hemolymph

  • Similarity

    Have you found similarities between this protein sequence and some other proteins?

    Examples:

    • Similar to Kunitz-type protease inhibitors
    • Similar to sea anemone sodium channel inhibitory toxins
  • EC number

    The EC number does not need to be complete, but please provide as much as you can.

    Examples:

    • EC number 6.2.1.-
    • EC number 1.97.1.1
  • Catalytic activity

    Do you know which reaction the enzyme catalyses?

    Examples:

    • D-mannose 6-phosphate = D-fructose 6-phosphate
    • Preferential cleavage: Ala-|-Xaa > Gln-|-Xaa > Tyr-Xaa >> Leu-|-Xaa > Gly-|-Xaa
  • Cofactor

    Do you know if a cofactor is required for enzyme activity?

    Examples:

    • Binds one 4Fe-4S cluster per dimer
    • Iron and ascorbate
  • Pathway

    Do you know if the enzyme functions in a specific metabolic pathway?

    Examples:

    • Proline biosynthesis; second step
    • Might catalyse the first step in the fatty acid beta oxidation pathway to form an alkanoyl-CoA intermediate
  • Enzyme regulation

    Do you know what activates or inhibits the enzyme?

    Examples:

    • Inhibited by E-64
    • Activated by calcium and inhibited by zinc
  • Vmax

    Do you know the maximal velocity of the reaction catalyzed by this enzyme (Vmax)? Additional information may be provided in the notes field.

    Examples:

    • 0.46 mM/min/mg
    • 57 umol/h/mg towards lactose at 40 degrees Celcius
    • 39.0 pM/min/ug in the presence of 0.7M NaCl
  • KM data

    Do you know the Michaelis-Menten constant (kM) for this enzyme? You must state the substrate used to determine the kM

    Examples:

    • 0.56 mM for D-glucose
    • 8.97 nM for ADP
  • Quarternary structure

    Does the protein consist of one peptide or several?

    Examples:

    • Monomer
    • Heterotetramer of two alpha and two beta chains or heterodimer of an alpha and a beta chain
  • Allergenicity

    Is the protein an allergen?

    Examples:

    • Allergen of horse dander
  • Subcellular location

    Where is the protein located, within the cell or outside

    Examples:

    • Attached to the membrane by a GPI-anchor
    • Secreted protein
    • Cytoplasm
  • Posttranslational modification

    Is the protein modified after it is synthesised?

    Examples:

    • Glycosylated
    • Four disulfide bonds are probably present
    • Ala-1 is partially carbamylated
    • Phosporylated in immature sperm. Dephosphorylated in mature sperm allowing a stronger interaction with DNA
  • Induction

    Do you know what circumstances cause the protein to be synthesized?

    Examples:

    • By bacterial infection
    • By heavy metal ions
    • By interferon gamma. A diverse population of cell types rapidly increases transcription of mRNA encoding this protein. This suggests that gamma-induced protein may be a key mediator of the interferon gamma response
  • Developmental stage

    Some proteins appear at specific moments in the development of an organism.

    Examples:

    • Expressed throughout all walled developmental stages
    • Unfolded and fully developed leaves
    • Present in seedlings but not in mature plants
    • First appears in the mid-log phase. Present at highest concentration in the late-log early-stationary phase before gradually diminishing. Significant amounts are still present after 30 hours
  • Optimum temperature

    Do you know the optimum temperature for this enzyme, variation of enzyme activity with variation in temperature, or therostability of the enzyme?

    Examples:

    • Optimum temperature is 70 degrees Celsius
    • Thermostable
    • Optimum temperature is 23-38 degrees Celsius. Still active after heating at 100 degrees Celsius for 20 minutes
  • Optimum pH

    Do you know the optimum pH for this enzyme, or variation of enzyme activity with variation in pH?

    Examples:

    • Optimum pH is 5.5. Stable above pH 5.5
    • Optimum pH is 7-8
    • Optimum pH is 6.0. Active from pH 4.5 to 10.5
  • Redox potential

    Do you know the standard redox potential for this protein?

    Examples:

    • E(0) is +150 mV
    • E(0) is -450 mV for the 3Fe-4S, and -645 mV for the 4Fe-4S clusters
  • Absorption

    Is your protein photoreactive? Does it absorb a particular wavelength? Please don't list absorbance values due to cytochromes, flavin groups in flavin-binding proteins or pyridoxal phosphate in PLP-containing proteins.

    Examples:

    • Abs(max)=353 nm
    • Abs(max)=~482 n
  • 2D-PAGE results

    Did you identify the protein from a 2D page? Please give the mass in kilodaltons and pI. Only experimental data may be entered here.

    Examples:

    • pI 4.74, Molecular weight 25.7 kDa
  • Miscellaneous

    Do you have any other information to give about the protein? If applicable, add the source of the data.

    Examples:

    • LD(50) is 41.1 mg/kg by subcutaneous injection

Sequence annotations

Are there any interesting features in the sequence you can tell us about?

Click on "add a feature" and choose the appropriate description from the drop-down list. If the feature is not on the list, other domain of interest or other site of interest should be selected.

  • Uncertainty in the sequence

    Are you unsure of the identity of an amino acid, or the order of some residues? Only experimentally determined information should be entered here.

    Examples:

    • You have entered residue 4 in your sequence as I, but are unsure if it is I or L:
      Start = 4, End = 4, Comments = I or L
    • You have entered residues 2-4 in your sequence as GHT, but are unsure if the sequence is GHT, GTH, or HTG:
      Start = 2, End = 4, Comments = GHT or GTH or HTG
  • Post-translationally modified residue

    Is the sequence posttranslationally modified? Please give the chemical nature of the modification in the comment field. The most common modifications are: Acetylation, Amidation, Formylation, Pyrrolidone carboxylic acid, Hydroxylation, Methylation, Phosphorylation and Sulfation. Exception: Glycosylation should be described using the annotation type 'Glycosylation site'.

    Examples:

    • Acetylation of the N-terminus:
      Start = 1, End = 1, Comments = Acetylation
    • 4-Hydroxyproline at position 10: Start = 10, End = 10, Comments = 4-hydroxyproline
  • Disulfide bond

    Are some of the residues linked by an intra-chain or inter-chain disulfide bond?

    Examples:

    • An disulfide bond links residues 5 and 16:
      Start = 5, End = 16
    • An intrachain disulfide bond links residue 3 with another polypeptide chain called alpha chain:
      Start = 3, End = 3, Comments = Intrachain with alpha chain
  • Active site

    If your protein is an enzyme, do you know which amino acids are involved in the activity of the enzyme? If known, the role of the active site (proton donor, nucleophile, charge relay system etc) should be added in the Comment field.

    Examples:

    • Residue 6 is known to be the proton donor:
      Start = 6, End = 6, Comments = Proton donor
    • Residue 10 is an active site, but the role is uncertain:
      Start = 10, End = 10
  • Glycosylation site

    Is the protein glycosylated? If you know more details, such as the type of the linkage, or the nature of the sugar (or the reducing terminal sugar), please add these to the Comment field.

    N-linked:
    This is the most prevalent carbohydrate linkage, which involves the addition of oligosaccharides to asparagine residues of secreted or membrane-bound proteins. The modification requires the following sequence motif: Asn-Xaa-Ser/Thr where 'x' cannot be a proline residue.

    O-linked:
    Glycans attached to serine, threonine and, to a lesser extent, hydroxyproline and hydroxylysine. Occurs on secreted and membrane-bound proteins, but also on cytoplasmic and nuclear proteins. The most common type of glycan linked in this fashion is N-acetylgalactosamine and, in rare cases, can be mannose, fucose, glucose, galactose or xylose.

    C-linked:
    C-glycosylation involves the mannosylation of specific tryptophan residues.

    Examples:

    • Residue 37 is glycosylated, with an N-linked polysaccharide which is high in mannose.
      Start = 37, End = 37, Comments =N-linked high mannose polysaccheride
    • Residue 15 contains an O-linked N-acetylgalactosamine
      Start = 15, End = 15, Comments =O-linked N-acteylgalactosamine
  • Metal ion-binding site

    Does the protein bind a metal ion? Please state in the comments field which metal is bound.

    Examples:

    • Iron is bound at residue 52:
      Start = 52, End = 52, Comments = Iron
    • A divalent metal cation is bound at residue 12, but the exact identity of the metal is unknown:
      Start = 12, End = 12, Comments = Divalent metal cation
  • Nucleotide-binding region

    Does the sequence bind a nucleotide phosphate? Please indicate the nature of the nucleotide phosphate in the Comment field.

    Examples:

    • ATP is bound from residues 3 to 9:
      Start = 3, End = 9, Comments = ATP
    • NADP is bound to residues 4 to 7 via its ribose part:
      Start = 4, End = 7, Comments = NADP (ribose part)
  • DNA-binding region

    Does the sequence contain a DNA binding domain? If known, please give the nature of the DNA-binding region in the comment field.

    Examples:

    • DNA is bound from residues 2 to 13:
      Start = 2, End = 13
    • DNA is bound to a Myb domain from residues 35 to 88:
      Start = 35, End = 86, Comments = Myb domain
  • Other binding site

    Does the protein have a covalent bond or a specific strong interaction with a chemical group (co-enzyme, prosthetic group, etc.)? Please state what is bound in the comments field.

    Examples:

    • Residue 20 binds to the substrate:
      Start = 20, End = 20, Comments = Substrate
    • Residue 17 binds heme covalently:
      Start = 17, End = 17, Comments = Heme (covalent)
    • A cysteine at residue 27 is S-palmitoylated:
      Start = 27, End = 27, Comments = S-palmitoyl cysteine
  • Transmembrane region

    Are there any transmembrane regions?

    Examples:

    • Transmembrane region from 4 to 24:
      Start = 4, End = 24
    • Transmembrane region from 12 to 21, N-terminus is extracellular:
      Start = 12, End = 32, Comments = N-terminus is extracellular
  • Other domain of interest

    Are there any other domains of interest within the protein? Please give details in the comments field.

    Examples:

    • Calcium binding from 3 to 11:
      Start = 3, End = 13, Comments = Calcium binding
    • Residues 15 to 29 have been found to be essential for toxicity:
      Start = 15, End = 29, Comments = Essential for toxicity
  • Other site of interest

    Are there any other sites of interest (single residues, or pairs of residues) within the protein? Please give details in the comments field.

    Examples:

    • Reactive bond for trypsin from 3 to 4:
      Start = 3, End = 4, Comments = Reactive bond for trypsin
    • Residues 13 is important for glycine-binding:
      Start = 13, End = 13, Comments = important for glycine-binding
  • Sequence variation

    Are there any natural variations in the sequence? This should only be used for natural variation. Please enter the position of the variant, and the alternative residue. Please give details in the comments field.

    Examples:

    • Position 4 is replaced by a T in allele 2:
      Location = 4, Variant residue = T, Comments = allele 2
    • Position 8 is not present in strain B:
      Location = 8, Variant residue = -, Comments = strain B

Evidence tags

We are using evidence tags to document where each piece of information has originated. Please tell us if the data you provide is

  • Experimental: experiments done by the submitting group
  • By similarity: derived from similarity to other proteins
  • Other: opinion of submitter or several sources combined. Please specify the sources of the data in the free text description.

FAQ

  • How do I update a previously submitted data?

    SPIN may be used for submitting new sequencing data to a publicly available UniProt Knowledgebase entry. If you want to correct/modify data you have submitted previously, or add new data to a confidential entry, please e-mail the changes to uniprot-submissions@ebi.ac.uk. Please use the update form to update other publicly available UniProtKB/Swiss-Prot entries or email help@uniprot.org.

  • How long will it take to get an accession number?

    We will process data submissions within 7 working days of receipt and send submitters notification of either which accession number(s) their data have been assigned or what additional information is needed. There are several things authors can do to minimize the time it takes to get an accession number:

    • Be sure that submissions include all the necessary information.
    • Check the data to be sure that they do not contain inconsistencies/errors.
    • Be sure to include the correct e-mail address.
  • What is the SPIN ID?

    You are provided with a unique Submission identifier number (SPIN ID) for your submission every time you start a new session from the SPIN home page. This number allows the system to keep track of your submissions and sequences. This SPIN ID number is not an accession number. You will receive an accession number once your submission has been processed by a curator. Once you have received an accession number you should cite it in all further communication with EBI.

  • How do I submit several proteins from one organism?

    Use a separate entry form for each protein/peptide.

  • How do I submit same protein from several organisms?

    Use a separate entry form for each organism.

  • How do I submit several fragments from one protein/peptide (one organism)?

    Use one entry form for all fragments.

  • How do I submit several fragments, may be from different proteins?

    Use a separate entry form for each fragment.

  • How do I submit several fragments from different proteins?

    Use a separate entry form for each fragment.

  • What are the browser requirements?

    We recommend using the latest versions of Firefox, Chrome, Safari, Opera or Internet Explorer. Javascript has to be enabled.

  • What is SPIN?

    SPIN is the Internet tool for the submission of directly sequenced protein sequences to the UniProt Knowledgebase.

  • Where is SPIN?

    SPIN is a service offered by the UniProt teams at the European Molecular Biology Laboratory - The European Bioinformatics Institute (EMBL-EBI) located in Cambridge (United Kingdom).

  • Who is responsible for SPIN?

    For any problems please contact uniprot-submissions@ebi.ac.uk.

-