DATA QUERY, REPORTING AND ACCESS

PDB Statistics: Structures Solved by Multiple Methods

As the PDB archive grows larger, there is an overlap of data generated by different experimental methods. The RCSB PDB provides a statistics table to highlight structures solved by multiple techniques at www.rcsb.org/pdb/statistics/clusterExpMethods.do.

This list of proteins solved by multiple experimental methods was generated by clustering PDB protein chains of at least 100 amino acids into clusters of greater than 95% sequence similarity, and then listing only those clusters which contain at least one structure solved by one method (e.g. X-ray) and at least one structure solved by a different method (e.g. NMR). The clustering was done using CD-HIT.

 


PDB ID 1grl: Braig, K., Otwinowski, Z., Hegde, R., Boisvert, D.C., Joachimiak, A., Horwich, A.L., Sigler, P.B. The crystal structure of the bacterial chaperonin GroEL at 2.8 A. Nature v371 pp.578-586 (1994)

 


Time-stamped Copies of PDB Archive Available via FTP

Time-stamped yearly snapshots of the PDB Archive are available from ftp://snapshots.rcsb.org/. It is hoped that these snapshots will provide readily identifiable data sets for research using PDB data.

The directory 20060103, which contains the exact and complete contents of the FTP archive as it appeared on January 3, 2006, has been added. It includes the 34,421 experimentally-determined coordinate files that were current (i.e. not obsolete) as of January 3, 2006. It joins the directory 20050106, which contains the frozen contents of the FTP archive as of January 6, 2005.

These snapshots follow the historical directory structure – coordinate files are contained in subdirectories named after the two middle characters of the PDB ID, for example, 100d is found in the directory '00'. In addition, symbolic links lead directly to the sets of experimentally determined coordinate files in PDB, mmCIF, and XML formats.

The date and time stamp of each file indicates the last time the file was modified. Entries in the PDB archive have been processed by the three members of the wwPDB (RCSB, MSD-EBI, and PDBj).

 


Structural Genomics Tools and Portal Described in Nucleic Acids Research Database Issue

"The RCSB PDB information portal for structural genomics" has been published in the latest issue of Nucleic Acids Research.

The article describes the online tools, summary reports, and target information related to structural genomics from a new information portal at sg.pdb.org.

From this site, information and links are provided for the structural genomics initiatives located worldwide, including reports for each center that provide target lists, target status progress, targets in the PDB, and sequence redundancy analyses.

Databases that track the progress of protein studies are available. TargetDB contains information about the progress of the production and solution of structures. PepcDB extends the content of TargetDB with status history, stop conditions, reusable text protocols and contact information collected from the PSI Centers.

A tool is also provided to explore the distributions of functions found among structural genomics structures, PDB structures, genomes, and homology models. This functional coverage can be examined according to Enzyme Classification, Gene Ontology (Biological Process, Cell Component, or Molecular Function) and Disease.

The abstract and full text of the article are also available from the Nucleic Acids Research website.

Andrei Kouranov, Lei Xie, Joanna de la Cruz, Li Chen, John Westbrook, Philip E. Bourne and Helen M. Berman
The RCSB PDB information portal for structural genomics
Nucleic Acids Research, 2006, Vol. 34, Database issue D302-D305

 


WEBSITE STATISTICS

Access statistics are given below for the RCSB PDB website at www.pdb.org.