The September 29, 2015 release offers the following features:
Validation Track onProtein Feature View
Mutation Track onProtein Feature View
Drill-down by UniProt Molecule Name
Update to Pfam version 28
Searching and Reporting by Chain Identifier
Sequence Cluster Report
Custom Report Web Services Improvements
Protein Feature View graphically summarizes a full-length protein sequence from UniProt and how it corresponds to PDB entries
and annotations from external resources in different "tracks".
A new track shows the quality of a protein chain as described in the wwPDB Validation Report mapped onto the amino acid sequence.
The validation track is color-coded to indicate the number of angle and bond outliers, as well as clashes for a given residue.
A red icon indicates RSRZ outliers, indicating a bad fit to the electron density map. Mouse-over the validation track displays
further details of the outliers.
For example, chain A of PDB ID 4HHB,
a hemoglobin protein structure from 1984, is displayed in Protein Feature View. The validation track shows a lot of red residues,
indicating many geometric outliers.
For comparision, here is the validation track of PDB ID 2W72 chain A,
a hemoglobin structure from 2008. While fewer angle and bond outliers as shown, this entry has several residues that fit
badly to the electron density map (RSRZ >2).
A description of wwPDB validation reports can be found at the wwPDB website and in the recommendations of the wwPDB X-ray Validation Task Force.
An Overview of the Protein Feature View is available.
The Mutation Track visually summarizes expression tags, cloning artifacts, and many other details about sequence mismatches between the studied protein sequences and the reference UniProt sequences.
In an overlay to the PDB track, new icons represent a number of different sequence modifications that can be observed in PDB files. Here a few examples:
Example: Gag-Pol polyprotein - P12497
A new drill-down identifies the most common UniProt molecule names related to PDB entries.
Above is a summary of the drill-down of
the whole PDB archive.
We updated our Pfam annotation pipeline to use the latest Pfam version 28.
Pfam is a large collection of protein families, each represented by multiple sequence alignments and hidden Markov models.
Since this is a major update of Pfam, which was several years in the making, some of the search results have changed.
E.g. Identification of protein kinases can be done now on the Pfam-Family level, e.g.,
Protein kinase domain - PF00069,
or Protein tyrosine kinase - PF07714.
We currently do not offer grouping of results on a Pfam-Clan level (e.g. PKinase CL0016 ).
When a new structure is released we perform a search against Pfam using the HMMER web services API.
The PDB sequence records are used for this scan. Once the Pfam domain annotations have been calculated, they are mapped onto the
PDB-ATOM coordinates (the PDB residue numbers) thereby ensuring the atomic coordinates are available.
You can access these Pfam-PDB annotations via the RCSB PDB RESTful API in the following way:
Details of the Pfam to PDB annotation pipeline are described in an article at the
A search by PDBId.ChainId(s) has been added to the Advanced Search system. Users can enter a
comma separated PDBId.ChainId list and get the results for the specified polymer chains.
Furthermore, tabular reports can be generated based on these chain identifiers. The chain-based
summary reports such as Sequence Report, Biological Details Report, and Sequence Clusters Report
can be retrieved with one-click from the Summary Reports drop down list.
Users can also pick the chain based report fields in the Customizable Table to take advantage of
Generate chain-based custom report:
A new Sequence Cluster Report has been added to the tabular report system. The report includes
sequence identity clusters from 100%, 90% to 30%. Sequence clusters contain protein chains grouped by sequence identity.
For example, the 90% sequence cluster groups protein sequences that are at least 90% identical.
It also includes UniProt Recommended and Alternative Names, Gene Name, and Macromolecular Name and Synonyms. Below is a complete list of fields in this report.
All queries and reports that can be generated on the website are available using RESTful Web Services,
including the Sequence Cluster Report.
To better support user workflow and data analysis requirements, the reports generated by our RESTful services
maintain the PDBId and PDBId.ChainId in the same order as the input query string.
The following example generates a custom report with PDBId.ChainId combinations and select fields from the Sequence Cluster Report. The report lists the PDBId.ChainId in the same order as the input query string and the output is in CSV format.
Custom report Web Services: All tabular report fields |
The RCSB PDB (citation) is managed by two members of the Research Collaboratory for Structural Bioinformatics:
RCSB PDB is a member of the
The RCSB PDB is funded by a grant (DBI-1338415) from the
National Science Foundation, the
National Institutes of Health, and the
US Department of Energy.