Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 May 27:2024:baae041.
doi: 10.1093/database/baae041.

PDB NextGen Archive: centralizing access to integrated annotations and enriched structural information by the Worldwide Protein Data Bank

Affiliations

PDB NextGen Archive: centralizing access to integrated annotations and enriched structural information by the Worldwide Protein Data Bank

Preeti Choudhary et al. Database (Oxford). .

Abstract

The Protein Data Bank (PDB) is the global repository for public-domain experimentally determined 3D biomolecular structural information. The archival nature of the PDB presents certain challenges pertaining to updating or adding associated annotations from trusted external biodata resources. While each Worldwide PDB (wwPDB) partner has made best efforts to provide up-to-date external annotations, accessing and integrating information from disparate wwPDB data centers can be an involved process. To address this issue, the wwPDB has established the PDB Next Generation (or NextGen) Archive, developed to centralize and streamline access to enriched structural annotations from wwPDB partners and trusted external sources. At present, the NextGen Archive provides mappings between experimentally determined 3D structures of proteins and UniProt amino acid sequences, domain annotations from Pfam, SCOP2 and CATH databases and intra-molecular connectivity information. Since launch, the PDB NextGen Archive has seen substantial user engagement with over 3.5 million data file downloads, ensuring researchers have access to accurate, up-to-date and easily accessible structural annotations. Database URL: http://www.wwpdb.org/ftp/pdb-nextgen-archive-site.

PubMed Disclaimer

Conflict of interest statement

None declared.

Figures

Figure 1.
Figure 1.
Accessing SIFTS annotations in the NextGen Archive: this figure displays a snippet from the NextGen Archive PDBx/mmCIF File for PDB ID pdb_00004daj, together with a 3D representation of the molecular structure. (A) Depicts the ‘_pdbx_sifts_unp_segments’ category, presenting two segments of PDB chain A, each mapped to UniProtKB accessions: P00720 and P08483. This suggests that PDB ID pdb_00004daj corresponds to a chimeric protein. (B) Illustrates the ‘_pdbx_sifts_xref_db_segments’ category, demonstrating residue range-based cross-references to additional databases like Pfam, SCOP2 and CATH. In this case, PDB chain A is associated with two Pfam domains, corresponding to a G-protein-coupled receptor (Pfam accession: PF00001) and Phage lysozyme (Pfam accession: PF00959). (C) Displays the ‘_pdbx_sifts_xref_db’ category, providing a comprehensive view of all mappings for each residue to external databases. Notably, the mappings from UniProt and other cross-reference databases (Pfam/SCOP2/CATH) are highlighted in a box for residue Asn30 in chain A.
Figure 2.
Figure 2.
Accessing Intra-molecular Connectivity Information in NextGen Archive: this figure displays a snippet from the NextGen Archive PDBx/mmCIF File and 3D representation of Hemoglobin, identified as the chemical component CCD HEM within PDB ID 3eqm. The ‘_chem_comp_bond’ and ‘_chem_comp_atom’ categories can be used for accessing detailed information about the bonds between atoms within a chemical component and the attributes of individual atoms in that component. Notably, the image highlights a specific instance where atom C3D forms a single bond with atoms C4D and CAD, and a double bond with atom C2D.
Figure 3.
Figure 3.
Systematic workflow of NextGen Archive: this figure outlines the structured process for maintaining and updating the NextGen Archive. It showcases key steps, including annotations collection from wwPDB partners, data quality checks, corrective actions, file aggregation and synchronized data in the staging area.
Figure 4.
Figure 4.
Availability of four-letter PDB codes versus time: this figure depicts the annual count of available four-letter PDB codes. Current projections anticipate exhaustion of four-letter PDB codes by the end of 2027.

Similar articles

References

    1. (1971) Crystallography: Protein Data Bank. Nat. New Biol., 233, 223.
    1. wwPDB consortium . (2019) Protein Data Bank: the single global archive for 3D macromolecular structure data. Nucleic Acids Res., 47, D520–D528. - PMC - PubMed
    1. Burley S.K., Berman H.M., Bhikadiya C. et al. (2019) RCSB Protein Data Bank: biological macromolecular structures enabling research and education in fundamental biology, biomedicine, biotechnology and energy. Nucleic Acids Res., 47, D464–D474. - PMC - PubMed
    1. Goodsell D.S., Zardecki C., Di Costanzo L. et al. (2020) RCSB Protein Data Bank: enabling biomedical research and drug discovery. Protein Sci. Publ. Protein Soc., 29, 52–65. - PMC - PubMed
    1. Westbrook J.D. and Burley S.K. (2019) How structural biologists and the Protein Data Bank contributed to recent FDA new drug approvals. Struct. Lond. Engl. 1993, 27, 211–217. - PMC - PubMed

MeSH terms

-