Note: Users should switch to binary mode before downloading data files.
The directory pub/pdb is the entry directory for the ftp site.
The directory pub/pdb/data/structures/divided contains the current PDB contents including PDB, mmCIF, and PDBML/XML formatted coordinate files, structure factors and NMR restraints:
- Entry files are date-stamped to show the date they were released
- Entries are grouped by the middle two characters of the 4-character PDB identifier. For example, entry file pdb100d.ent can be found in pub/pdb/data/structures/divided/pdb/00/pdb100d.ent.gz
- For more information on the contents of major directories please click here.
- A list of summaries of PDB data available on the FTP site can be found here.
- A site map of the FTP site can be found here.
- For information about large structures that cannot be represented in the legacy PDB file format see here.
Automated Download of Data from the PDB FTP Archive
The RCSB PDB also provides some example scripts to assist in the automated download of data from the ftp site.
- ftp://snapshots.rcsb.org/rsyncSnapshots.sh (rsync homepage) To make a local copy of an annual snapshot or sections of the snapshot. This script is annotated to assist in downloading only sections of the archive.
- ftp://ftp.wwpdb.org/pub/pdb/software/rsyncPDB.sh (rsync homepage) To copy the current contents of the entire archive
Additional information on obtaining and maintaining copies of the entire PDB archive or certain portions of it is available at http://www.wwpdb.org/downloads.html.
Additional RCSB PDB FTP services
A supplemental ftp archive is solely maintained by the RCSB PDB at ftp://resources.rcsb.org.
This table summarizes the contents of ftp://resources.rcsb.org, a supplemental ftp site maintained solely by the RCSB PDB. Clicking on a directory or file name will open that content.
|Directory or File||Contents|
|Results of the weekly clustering of protein chains in the PDB by
at 30%, 40%, 50%, 70%, 90%, 95%, and 100% sequence identity. For more information, see
Redundancy in the Protein Data Bank.
|Results of the weekly clustering of protein chains in the PDB by cd-hit at 50%, 70%, 90%, and 95% sequence identity.|
|/sequence/clusters/not_in_clusters.txt||Contains nucleic acid chains and short polypeptides of fewer than 20 amino acids, which are not clustered.|
|/files/split_biol_assembly.txt||List of split entries (structures split across multiple PDB files) and their biological assemblies|
|/fatcat_rigid_pdb_all/fatcat_rigid_pdb_all.txt.gz||Data for all vs. all structure alignments for the full PDB archive, as described in Prlic et al Bioinformatics 2010|
|/protmod/protmod.tsv.gz||Protein modification data for the full PDB archive, as described in Gao et al Bioinformatics 2017|
Obtaining Files that used to be in the Brookhaven PDB FTP Archive
The RCSB PDB no longer maintains an up-to-date copy of the BNL PDB FTP archive. Please click here for more information.