The RCSB PDB RESTful Web Service interface
The RCSB PDB supports RESTful (REpresentational State Transfer) Web Services to make accessing data easier. Please use these services instead of screen-scraping.
Generally we are trying to implement two types of services for our RESTful interface:
- Search services: to return a list of IDs (e.g., PDB IDs, chain IDs, ligand IDs)
- Fetch services: to return data given a ID (e.g. reports, descriptions, data items)
The services below are currently provided; please let us know if you have additional suggestions.
- A generic SEARCH service allowing to POST advanced queries
- Search for ligands and PDB IDs based on a SMILES query
About SEARCH services results
We have more than 80 query options in the advanced search system. All the advanced queries can be done by posting the relevant XML query representation to the search services. The queries can be categorized to four types based on the query results.
- Structure-based queries return a list of PDB IDs. Some examples are Author Name query, Macromolecule Type query, etc.
- Entity-based queries return a list of PDB IDs appended with entity IDs in the format of pdbid:entityid,...,pdbidn:entityidn. Some examples are Sequence BLAST query, Wild Type Protein query, etc.
- Chain-based query, e.g. Chain ID query. The query result is in the format pdbid:chainid,...,pdbidn:chainidn. It is useful for generating report on the specific chains.
- Chemical component queries return a list of ligand IDs. Some examples are Chemical Name query, Chemical structure (SMILES), etc.
- Get descriptions for the whole PDB file
- Get descriptions for PDB entities
- Get descriptions of chemical components
- Get ligands that are present in a single PDB entry
Lists and status
- Get a PDB ID's release status
- Get a list of all currently released PDB IDs
- Get a list of unreleased PDB IDs
- Get the pre-release sequences in FASTA format
- If there are biological assemblies, get the number of biological assemblies that are available for a PDB ID
- Third-party annotations mapped onto PDB chains
- A PDB to UniProtKB mappings
- Pfam mappings for PDB
- Access Gene Ontology terms for a PDB chain.
Sequence and Structure Clusters
- Sequence and Structure cluster related services
- Example: Access the information for one file: /pdb/rest/describePDB?structureId=4hhb
- Example: Access the information only for multiple files: /pdb/rest/describePDB?structureId=4hhb,1hhb
- Example: List all current PDB IDs in XML: /pdb/rest/getCurrent
- Example: List all current PDB IDs in JSON: /pdb/json/getCurrent
BLAST with Structure ID and Chain ID. PostBLASTQuery.java
- Example: Access the information for a subset of PDB IDs: /pdb/rest/getUnreleased?structureId=450D,451D
- Example: Access the information for all unreleased PDB IDs: /pdb/rest/getUnreleased
- Example: Access the information of unreleased PDB IDs based on SG Project Initials: /pdb/rest/getUnreleased?sgInit=SSGCID
- Example: Access the information for a subset of PDB IDs: /pdb/rest/getStatusSequence?structureId=3MU6,3QV1,3SSF
- Example: Access the information for a subset of PDB IDs, entity type, and range of deposition date: /pdb/rest/getStatusSequence?entityType=RNA&depositionDateMin=2011-07-01&depositionDateMax=2011-07-30 (The entity type can be RNA, DNA, and Polypeptide.)
- Example: Access the information for all unreleased PDB IDs: /pdb/rest/getStatusSequence
Not all PDB entries have biological assemblies available and some have multiple. Details that are necessary to recreate a biological assembly from the asymmetric unit can be accessed from the following requests.
- Number of biological assemblies associated with a PDB entry /pdb/rest/bioassembly/nrbioassemblies?structureId=1hv4
- Access the transformation information needed to generate a biological assembly (nr=0 will return information for the asymmetric unit, nr=1 will return information for the first assembly, etc.) /pdb/rest/bioassembly/bioassembly?structureId=1hv4&nr=1
- /pdb/rest/sequenceCluster?cluster=40&structureId=4hhb.A Get all the sequence cluster at 40% sequence ID for a particular chain ID.
- /pdb/rest/representatives?cluster=40 Get all representative chains from the 40% sequence clustering.
- /pdb/rest/representatives?structureId=4hhb.A Get the representative for a chain (the chain on Rank 1 in the sequence clustering).
- /pdb/rest/representativeDomains?cluster=40 Get all representative domains that are used for the systematic structure alignments.
- /pdb/rest/representativeDomains?structureId=4hhb.A Get the representatives domains for a chain.
The names of the representative domains are either SCOP domain IDs (starting with "d") or domain IDs as have been assigned with the Protein Domain Parser software ("PDP:")
The all vs. all structural similarity results table for a representative chain can be downloaded in XML. For example, this returns Rank, PDB.Chain, Description, P-value, Score, RMSD, Len1, Len2, %Sim1, and %Sim2 for 3BMV.A
Note: A maximum 2000 rows can be returned through this URL. To fetch all approx. 17,000 results for a chain, you need to slice through the results using the page parameter.
There are additional parameters for this URL, which allow sorting. e.g. this sorts by P-value /pdb/explorer/structCompXMLData.jsp?method=pw_fatcat&showAllResults=false&chain=d1iarb1&rows=15&page=1&sidx=probability&sord=asc Parameters that can be used for sorting (the value of the sidx parameter) are: name2,desc,probability, score, rmsdOpt, len1, len2, pid, sim1, sim2.
The RESTful services for providing third-party annotations (SCOP, CATH, Pfam, etc.) follows the DAS protocol. The DAS server is available from /pdb/rest/das. The DAS protocol command SOURCES (description of the provided sources) is available from /pdb/rest/das/sources. At the present there are two DAS sources provided:
pdbchainfeatures:Provides various third party annotations that have been computed for PDB chains.
Get the third party features (a DAS - FEATURES request): /pdb/rest/das/pdbchainfeatures/features?segment=5pti.A
Get the ATOM sequence of the PDB chain (a DAS - SEQUENCE request): /pdb/rest/das/pdbchainfeatures/sequence?segment=5pti.A
pdb_uniprot_mapping:Provides a DAS source that is serving the alignment between UniProtKB and PDB, derived from the SIFTS mappings.
Example: Get all alignments for a PDB (a DAS - ALIGNMENT request) /pdb/rest/das/pdb_uniprot_mapping/alignment?query=4hhb
Example: Get all alignments for a single PDB chain (a DAS - ALIGNMENT request) /pdb/rest/das/pdb_uniprot_mapping/alignment?query=4hhb.A
For more documentation on the DAS protocol, see http://www.dasregistry.org/
- EXACT: find an exact structure match
- SUBSTRUCTURE: find ligands that contain the specified structure as a substructure
- SUPERSTRUCTURE: find ligands that are substructures (fragments) of the specified structure /pdb/rest/smilesQuery?smiles=OC(=O)c1ccc(OCc2ccccc2)cc1&search_type=superstructure <% String html4 = "smiles=OC(=O)c1ccc(OCc2ccccc2)cc1&search_type=similarity&similarity=0.7"; String smilesEnc4 = HtmlEncoder.encode(html4); %>
- SIMILARITY: find structures that bind similar ligands.
Specify a similarity threshold to change the degree of similarity in the [0...1] range: 0 dissimilar ... 1 identical.
The similarity is based on the number of chemical features in common between the query and the target molecule. Similarity is calculated using the Tanimoto Coefficient. /pdb/rest/smilesQuery?smiles=OC(=O)c1ccc(OCc2ccccc2)cc1&search_type=similarity&similarity=0.7
This interface exposes the RCSB PDB advanced search interface as an XML Web Service.
To use this service, POST a XML representation of an advanced search to /pdb/rest/search.
XML representation of advanced search
Every advanced search can be represented by XML. To view an example representation, simply execute an advanced search query and then click on the Result tab. One of the links on the top of the page is Query Details.
Every query is described by two data items:
- queryType: the name of the class that is implementing the query
- arguments: depending on the type of query that is being executed one or more differently named arguments need to be provided.