New Features

PDB NextGen Archive Now Provides Intra-molecular Connectivity

07/04 wwPDB News

Version 1.0 of the next generation archive repository (NextGen) for the PDB archive was made available in early 2023. This “NextGen” archive hosts enriched atomic coordinate files, in both PDBx/mmCIF and PDBML formats, with files available to download at files-nextgen.wwpdb.org.

The initial launch of the NextGen archive enriched coordinate files from the core PDB archive with sequence annotation from external resources such as UniProt, SCOP2 and Pfam at atom, residue, and chain levels. After consulting with user community, this release has added intra-molecular connectivity for each residue present in an entry, helping users transitioning from legacy PDB format to PDBx/mmCIF format. The connectivity information includes atom pairs, bond order, aromatic flag, and stereochemistry as incorporated from the PDB Chemical Component Dictionary (CCD). Users can extract this information from the _chem_comp_bond and _chem_comp_atom categories of the PDBx/mmCIF-formatted files from the NextGen archive.

To transition from legacy PDB format to PDBx/mmCIF, the file naming and data are structured based on extended PDB IDs with a two letter hash code, ‘third from last character' and 'second from last character’. This hash code will remain consistent once PDB ID codes are extended beyond four characters with the pdb_ prefix, e.g., PDB entry 8aly: https://files-nextgen.wwpdb.org/pdb_nextgen/data/entries/divided/al/pdb_00008aly/pdb_00008aly_xyz-enrich.cif.gz.

Users are encouraged to adopt PDBx/mmCIF format as early as possible. Learn more about PDBx/mmCIF format and related software resources at mmcif.wwpdb.org.

In the future, the PDB NextGen archive will continue to be updated with more enriched annotations from external database resources in the metadata, building on the content already provided in the structure model files in the PDB archive at files.wwpdb.org.

New Features Index