Read the latest PDB news.
Earlier news is available and is archived in the RCSB PDB newsletters.
The PDB staff wish to extend our best wishes to the community for a happy holiday season and a wonderful new year!
The PDB update that would normally occur on December 25 will instead take place on December 21. The update that would normally occur on January 1 will take place on December 28.
The regular PDB update schedule will return with the January 9 update.
The fall 2001 edition of the PDB newsletter is now being distributed in print format. This periodical features articles covering the previous three months of PDB's progress, as well as current developments in Data Deposition; Data Query, Reporting, and Access; and Outreach. This issue is also available in HTML, plain text, and PDF formats at http://www.rcsb.org/pdb/general_information/news_publications/newsletters/2001q3/index.html. Previous editions of the PDB newsletter are available at http://www.rcsb.org/pdb/general_information/news_publications/newsletters/index.html. To receive a printed copy of PDB Newsletter 11, please email your request and postal address to firstname.lastname@example.org.
ADIT, the AutoDep Input Tool, is the integrated software system used by PDB annotation staff for checking and editing PDB structure data entries.
A version of this software for workstation use has been released for alpha testing at http://pdb.rutgers.edu/software. The system includes tools to help users check and prepare structure depositions. A file in mmCIF format can be created and then deposited to the PDB at http://deposit.rcsb.org/adit/.
The functionality of the workstation version is similar to that provided by the Web-based PDB deposition system (http://deposit.rcsb.org/adit/). The alpha version of the software is currently available in binary form for Linux platforms. Questions about this software may be sent to email@example.com.
The STING Millennium Suite (SMS) is a set of Java-based tools for the simultaneous display of information about macromolecular structure and sequence. Individual components of SMS are now available as hyperlinks from the Structure Explorer pages of the PDB beta test site.
SMS was developed by Dr. Goran Neshich of Embrapa-CNPTIA (Campinas, Brazil) and colleagues, in collaboration with Dr. Barry Honig's laboratory at Columbia University in New York City, NY. The SMS links from the PDB site are served by an SMS mirror that is now being maintained at the San Diego Supercomputer Center.
The "Sequence Details" and "View Structure" sections of the Structure Explorer now link to two interactive SMS views for any PDB structure. Users can access both structure and sequence views for a particular structure, which include options to access features such as a graphical display of amino acid contacts; these views require Chime and a Java-enabled Web browser. Instructions for the installation and configuration Chime are available at http://www.rcsb.org/pdb/resources/help/help_graphics.html
A simpler "Protein Dossier" view is also available from the "Sequence Details" section, offering a static graphical summary of sequence-based properties, such as relative entropy and PROSITE motifs, as well as structure-based properties, such as temperature factors, solvent accessibility, amino acid contacts, and interface (chain contact) regions.
The "Geometry" section of the Structure Explorer now links to a Ramachandran plot for each PDB entry, also served from SMS. Options in this view allow for the inter-connection of data in a dihedral angle plot with the 3-D structure of the molecule. Subsets of amino acids can also be highlighted for better correlation among 3-D structure position and Psi/Phi spots. This view also requires Java and Chime.
SMS is also accessible from the "Other Sources" section of the Structure Explorer for each PDB entry, under the category of Visualization resources.
Further information about this suite of tools is available from the SMS home page at http://mirrors.rcsb.org/SMS/, and at http://beta.rcsb.org/pdb/resources/help/help_results.html. Questions or comments are appreciated, and may be sent to firstname.lastname@example.org.
The latest PDB CD-ROM (release #98) set is currently being distributed. This release contains the macromolecular structure entries for the 16,121 structures available as of October 1, 2001. The CD-ROMs are produced quarterly as of the last update of the PDB Web site for March, June, September and December. The experimental data (X-ray structure factors and NMR constraints) are also included, if available. Further information is available at http://www.rcsb.org/pdb/general_information/about_pdb/cdrom_distribution.html.
PDB deposition systems have been extended to accept data items specifically describing cryo-electron microscopy (cryo-EM) methods, for those EM methods that generate fitted coordinates. These items were developed in collaboration with members of the cryo-EM community, the EBI-MSD group, and the PDB.
New versions of ADIT (http://pdb.rutgers.edu/adit/ and http://pdbdep.protein.osaka-u.ac.jp/adit/) and AutoDep (http://autodep.ebi.ac.uk/) that support these new data items for cryo-EM depositions are available.
Further information about AutoDep is available at http://autodep.ebi.ac.uk/release-notes.html .
ADIT users can now select from the X-ray, NMR, or Cryo-EM deposition views. Further information about ADIT is available at http://pdb.rutgers.edu/ .
Cryo-EM data item definitions are available from the following sources:
A PDB file format template for REMARKs is available at:
Enzymes in the PDB have been classified in a hierarchical tree structure using the standards of the Enzyme Commission (EC). This classification permits users to search for enzymes by EC number or EC class/name through the SearchFields interface. It also allows users to "browse" through all enzymes using an Enzyme Browser interface, also available through SearchFields.
EC searches can be accomplished by selecting the "EC Number and Classification" option at the bottom of the SearchFields form, and then pressing the "New Form" button. New fields for "EC Number" and "Enzyme Class/Name" will then be available for searches based on these parameters. For example, searches for HIV-1 Protease in the PDB can be performed by entering "Retropepsin" in the "Enzyme Class/Name" field or by entering "22.214.171.124" in the "EC Number" field.
Selecting the "Browse and Select from Enzyme Classification" link under the "Enzyme Class/Name" box will launch the Enzyme Browser in a separate window. This interface allows users to navigate through different echelons of enzyme classification to arrive at a subset of particular interest. In the Enzyme Browser table, clicking on the "EC Number" column allows you to move up and down the EC tree. Clicking on a number in the "# of Structures in the PDB" column item gives you access to all those structures in the "Query Result Browser". Selecting the "use" link for any row will bring that EC Number and Enzyme Class/Name into the query form so that you can search the PDB database for that specific item.
The enzyme name/classification can be used only with the nomenclature or subclass exactly as given in the Enzyme Commission Nomenclature or substrings thereof.
Enzyme Nomenclature (1992) Recommendations of the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology. Academic Press, New York.
Related information is available at the following Web sites:
Further details on using this feature can be accessed at http://www.rcsb.org/pdb/resources/help/help_searchfields.html. A basic query tutorial that offers guidance on using other SearchFields features is available at http://www.rcsb.org/pdb/resources/tutorials/searching_archive.html.
The PDB has released an alpha version of the OpenMMS Toolkit, a suite of software tools that implement the Corba standard (OMG specification dtc/2001-04-06). The Corba specification provides a standard application programming interface (API) that allows remote programs direct access to the detailed experimental and macromolecular data available in the PDB. More information about the PDB and Corba is available at http://www.rcsb.org/pdb/general_information/news_publications/newsletters/2001q1/corba.html.
The OpenMMS Toolkit was developed to facilitate the use of macromolecular structure data by various scientific applications. In addition to a reference implementation that demonstrates the functionality of the OMG Corba specifications, the toolkit also contains software for parsing data files and for creating and loading a relational database.
Compiled and source-only distributions of OpenMMS are available at http://openmms.sdsc.edu/. Questions may be sent to email@example.com.
The PDB Annual Report 2001 is now available from the PDB Web site in PDF format at http://www.rcsb.org/pdb/general_information/news_publications/annual_reports/annual_report_year_2001.pdf. This document features a detailed look at the second full year of the RCSB's operation of the PDB from July 1, 2000 through June 30, 2001. It highlights PDB functions, accomplishments during this period, and plans for the coming year. Printed copies can also be obtained by sending your request and postal address to AnnualReport@rcsb.org.
The manual used as a guide by the PDB ADIT annotators for PDB Data Processing and Annotation is available online.
This document, a reference for the annotation staff, describes how the PDB data processing software system is used to produce the files that are released into the PDB archive. It is available in PostScript format from http://www.rcsb.org/pdb/info.html#File_Formats_and_Standards and can be viewed using readers such as Ghostview.
Since 1999, the Research Collaboratory for Structural Bioinformatics (RCSB) has maintained two distinct FTP sites: the RCSB Protein Data Bank (PDB) site at ftp://ftp.rcsb.org/ (and its mirrors; see http://www.rcsb.org/pdb/general_information/mirror_sites/index.html), and the Brookhaven National Laboratory (BNL) PDB site at ftp://bnlarchive.rcsb.org/.
In order to conserve resources and avoid confusions arising through the existence of two distinct PDB FTP sites, the RCSB will phase out the BNL PDB archive as of March 1, 2002. This decision was made after consultation with the PDB Advisory Committee and review by members of the PDB user community.
The files currently available only at ftp://bnlarchive.rcsb.org/pub/resources/index/ will be made available at ftp://ftp.rcsb.org/pub/pdb/derived_data/index/.
Current users of the BNL PDB archive are encouraged to consider the option of mirroring the RCSB FTP archive. The RCSB FTP archive can be found at ftp://ftp.rcsb.org/ and instructions for mirroring it can be found at http://www.rcsb.org/pdb/ftpproc.final.html.
A Perl script is provided to assist with conversion of existing BNL FTP directory structure to the RCSB FTP directory structure. Further information about the script is available at ftp://ftp.rcsb.org/pub/pdb/software/bnl2rcsb.pl.
Please send your questions or concerns, or requests for assistance regarding this change, to firstname.lastname@example.org.
CIFTr is an application program that translates files in mmCIF format into files in PDB format. CIFTr works on UNIX platforms, and can be downloaded at http://pdb.rutgers.edu/software/. CIFTr also provides the option of producing a file with a blank chain ID field for structures with a single chain, and the option of producing files with standard IUPAC hydrogen nomenclature for standard L-amino acids.
CIFTr was released this summer along with the files from the Data Uniformity Project. These files, available in mmCIF format, can be accessed from the PDB beta FTP site at ftp://beta.rcsb.org/pub/pdb/uniformity/data/mmCIF/. Further information about the Data Uniformity project is accessible at http://www.rcsb.org/pdb/uniformity/.
The PDB produces a quarterly newsletter that describes the latest PDB developments and provides statistics about data deposition and data access.
The newsletter is distributed in plain text format through e-mail. A printed version is also distributed via postal mail.
The newsletter is available in PDF format from the PDB home page, and in both plain text and PDF formats from the newsletter archive page at http://www.rcsb.org/pdb/newsletter.html.
Subscriptions for the electronic version of the newsletter can be requested via the Subscribe link on the PDB home page. Printed copies can be requested by sending your name and postal address to email@example.com.
The Molecule of the Month series is a wonderful collection of short columns featuring a new PDB structure of interest each month. They describe the functions and significance of the selected biological macromolecules for a general audience, providing a basic understanding of structural interactions. Written and illustrated by Dr. David S. Goodsell of the Scripps Research Institute, this feature adds a unique aesthetic quality and informative educational resource to the PDB Web site. You can access the Molecule of the Month installations at http://www.rcsb.org/pdb/education_discussion/molecule_of_the_month/index.html.
The release status of structures is determined at the time of deposition by the author. The status HPUB is used to indicate that a structure will be released when the corresponding journal article is published. Publication is considered to be when the article is distributed by the publisher, either in print or electronically. Structures are released when the PDB can confirm that the article corresponds with the entry.
The PDB receives publication dates and citation information from some journals. For other journals, the PDB scans the literature for publication information. We also greatly appreciate the citation information that is sent to us at firstname.lastname@example.org from the community.
An option that allows users to select a subset of structures from which homologous sequences have been largely removed is now available from the primary PDB Web site and its mirrors. This option, which is available from all PDB search interfaces, filters subsets of structures that match a particular query. The default threshold for sequence similarity removal for queries from the home page or SearchLite is 90%; SearchFields provides the option of selecting either 50, 70, or 90% similarity as cut-off values. Users can toggle between the complete set of results and the reduced subset by using the options menu at the top of the Query Result Browser.
Further information about this new feature is available at http://www.rcsb.org/pdb/redundancy.html. Questions or comments may be sent to email@example.com.
In order to provide the best access possible to the user community, the systems serving the primary distribution site have been upgraded. Two pairs of load-balanced Enterprise class Sun servers now administer the main Web server (www.rcsb.org/pdb) and the ftp server (ftp.rcsb.org/).
These redundant systems have been installed to ensure access -- even in the case of a hardware failure. These enhancements, in conjunction with the two independent network paths that now provide Web access to the sites, will allow PDB users to enjoy even more robust connectivity to PDB's resources.
Under the sponsorship of the NIGMS, the PDB has created a centralized registration database for target sequences from the NIH P50 structural genomics projects at http://targetdb.pdb.org/.
Target sequences are collected weekly from each of the seven NIH structural genomics centers: the Berkeley Structural Genomics Center, the Joint Center for Structural Genomics, the Midwest Center for Structural Genomics, the Northeast Structural Genomics Consortium, the New York Structural Genomics Research Consortium, the Southeast Collaboratory for Structural Genomics, and the Tuberculosis Structural Genomics Consortium.
The target database can be searched by sequence using FASTA (Pearson, W.R. and Lipman, D.J. (1988) "Improved tools for biological sequence comparison" PNAS 85:2444-2448). Sequence searches may include only the P50 target sequences or the P50 and PDB sequences. Target sequences may also be searched by contributing P50 site, protein name, project tracking identifier, date of last modification, and the current status of the target (e.g. cloned, expressed, crystallized, ...). Search results may be viewed as HTML reports, FASTA data files, or in XML.
Target data for all of the NIH projects can be downloaded as an XML document. The XML document is organized following the recommendations of the International Task Forces on Target Tracking (see http://www.nigms.nih.gov/news/meetings/airlie.html for more information). This document type definition for the target data file can be retrieved from http://targetdb.pdb.org/apps/target.dtd.
While the PDB Newsletter was started in 1974, only issues dating back to 1993 have been accessible on-line until recently. Early paper copies have been scanned and are now available at ftp://ftp.rcsb.org/pub/pdb/doc/newsletters/old_bnl/ in PDF format (see http://www.adobe.com/products/acrobat/alternate.html for a free reader).
Sixty-three PDB newsletters, ranging from September 1974 to January 1993, are included in this set. Changes in technology become very evident as you read through these newsletters. The earliest were prepared on a typewriter and some by pasting sections together. The later newsletters resemble the printed versions of today.
The history of the Protein Data Bank - the growth of the resource, the means of delivery of the data, and the evolution of standard formats - can be traced through these newsletters. We hope you enjoy them.
The PDB will be at the 20th European Crystallographic Meeting in Krakow, Poland from August 25th-31st, 2001. A talk entitled "Data Deposition and Data Processing" will be given by Bohdan Schneider as part of the Crystallographic Computing Databases Microsymposia Thursday (August 30) and a PDB poster will be presented on Sunday and Monday (S9.M3.P2; August 26-27). We look forward to seeing you there!
A chapter describing the PDB's systems for the data resource has been published in the International Tables for Crystallography --
"The Protein Data Bank, 1999 -". H.M. Berman, J. Westbrook, Z. Feng, G.L. Gilliland, T.N. Bhat, H. Weissig, I.N. Shindyalov, and P.E. Bourne. International Tables for Crystallography. Volume F: Crystallography of Biological Macromolecules. M.G. Rossmann and E. Arnold, Editors. Dordrecht: Kluwer Academic Publishers. The Netherlands.
Also included are chapters on the Nucleic Acid Database, The Cambridge Structural Database, the Biological Macromolecule Crystallization Database, and the history of the PDB at Brookhaven.
Further information about the International Tables is available from the IUCr at http://www.iucr.ac.uk/iucr-top/it/index.html.
The Protein Data Bank holdings contain considerable redundancy in sequence and structure. An option that allows users to select a subset of structures from which homologous sequences have been largely removed is now available from the Beta Test Site. This option, which is available from all search interfaces on the beta site, filters subsets of structures that match a particular query.
Removing sequence homologues from queries via the home page and SearchLite returns representatives of protein structures with less than 90% sequence similarity. SearchFields provides the option of selecting either 50, 70, or 90% similarity as cut-off values. The user can then toggle between the complete set of results and the reduced subset by using the options menu at the top of the Query Result Browser.
While sequence homology is defined on a per chain basis, results are returned on a structure basis. Results may differ from other non-redundant sets outside the PDB. The CD-HIT algorithm (Cluster Database at High Identity with Tolerance) is used to remove redundant sequences and leave only the representatives (Li, W., Jaroszewski, L. and Godzik, A.; Bioinformatics, (2001) 17:282-283). CD-HIT can be found at http://bioinformatics.ljcrf.edu/cd-hi/ .
Further information about this new feature is available at http://beta.rcsb.org/pdb/redundancy.html. Questions or comments on this feature may be sent to firstname.lastname@example.org.
Helen M. Berman, Director of the PDB, spoke at a hearing held by the House Science Subcommittee on Research that examined the impact federal investment has had on promoting innovation in information technology.
At this session "Innovation in Information Technology: Beyond Faster Computers and Higher Bandwidth", Prof. Berman described how the developments in technology have influenced the growth of the PDB. She also described how the way the PDB archives and distributes data has changed as computer and information technologies have advanced.
Details of the hearing are available at http://www.house.gov/science/research/reshearings.htm.
Thanks to all the conference participants who visited the PDB exhibit at the Intelligent Systems for Molecular Biology (ISMB) conference in Copenhagen, Denmark. We would also like to thank the attendees who stopped by the PDB poster at the Protein Society's annual symposium in Philadelphia, PA. Your support and feedback are always appreciated!
The PDB thanks everyone who visited our exhibit booth, and poster at the American Crystallographic Association's Annual Meeting in Los Angeles, CA. The users lunch was a great success, and we would like to thank the companies that helped to support this event -- GlaxoSmithKline, IBM, Merck, Pharmacia, Procter & Gamble, and the Schering-Plough Research Institute.
The PDB will be presenting at 15th Symposium of the Protein Society (July 28-August 1, Philadelphia, PA) at poster d50 on all three poster session days.
The ADIT deposition and annotation site established at the Institute for Protein Research at Osaka University in Osaka, Japan has been operational for a year. Entries deposited at this site have been processed by staff at the Laboratory of Protein Informatics (Head, Professor Haruki Nakamura) at the Institute for Protein Research at Osaka University. Under the direction of Drs. Masami Kusunoki and Genji Kurisu, these entries are processed using ADIT by Takashi Kosada, Reiko Igarashi and Yumiko Kengaku, and are incorporated into the PDB archive.
Part of the success of this cooperative agreement is due to the productive visits that the RCSB and Osaka group members have made to both sites. We look forward to continuing collaborations with this group.
In addition to the ADIT Osaka site at http://pdbdep.protein.osaka-u.ac.jp/adit/, depositions to the PDB can also be made at the RCSB-Rutgers site (ADIT; http://pdb.rutgers.edu/adit/) and at the European Bioinformatics Institute (AutoDep; http://autodep.ebi.ac.uk/).
A recent visit with Kyle Burkhardt (RCSB) and the Osaka group in Japan.
Back row: Reiko Igarashi, Takashi Kosada, Kyle Burkhardt, Yumiko Kengaku
Front row: Genji Kurisu, Masami Kusunoki
All of the released PDB entries are now available in mmCIF format from the PDB beta ftp site at ftp://beta.rcsb.org/pub/pdb/uniformity/data/mmCIF/. Comments are welcomed on this data.
The files follow the latest version of the mmCIF dictionary supplemented by an exchange dictionary developed by the PDB and the European Bioinformatics Institute. This exchange dictionary can be obtained from http://pdb.rutgers.edu/mmcif/.
An application program called CIFTr is available for translating files in mmCIF format into files in PDB format. CIFTr works on UNIX platforms, and can be downloaded at http://pdb.rutgers.edu/software/. CIFTr also provides the option of producing a file with a blank chain ID field for structures with a single chain, and the option of producing files with standard IUPAC hydrogen nomenclature for standard L-amino acids.
In the next week, the PDB will be exhibiting at the ACA and ISMB meetings. We hope to see you there:
American Crystallographic Association's Annual Meeting (July 21-26, Los Angeles, CA, USA). The PDB will be exhibiting in booth number 111 and will be having a users lunch on Tuesday, July 24 at noon in room Beaudry B.
Intelligent Systems for Molecular Biology 9th International Conference (July 21-25, Copenhagen, Denmark). PDB members will be exhibiting at the ISMB meeting in the Tivoli Gardens.
The summer issue of the PDB's quarterly newsletter is available from the PDB Web site at http://www.rcsb.org/pdb/newsletter/2001q2/index.html. This periodical features news articles that cover the previous three months of PDB's progress, as well as the current states of Data Deposition; Data Query, Reporting, Access, and Distribution; and Outreach. This issue is available in .pdf format at http://www.rcsb.org/pdb/general_information/news_publications/newsletters/2001q2/PDBnewsletter2001q2.pdf. A plain text copy is also available from the ftp site at ftp://ftp.rcsb.org/pub/pdb/doc/newsletters/rcsb/newsletter10.txt.
PDB users can now enjoy further enhanced access to the primary PDB Web site at http://www.rcsb.org/pdb/ thanks to enhanced connectivity at the SDSC-PDB site. SDSC now has 40 Mbits of exclusive Internet bandwidth, almost doubling its Internet capacity, with connectivity to two independent sources. This will provide even more reliable access to this PDB site.
The PDB has updated the information included on the structural genomics page at http://www.rcsb.org/pdb/structural_genomics/index.html. The purpose of this page is to provide an entry point to additional information on structural genomics relevant to PDB users. Links and other structural genomics-related developments may be provided to the PDB by sending e-mail to email@example.com.
After a favorable period of testing on the beta site, the PDB production site now offers sequence data before the release of the corresponding coordinate data through the PDB status search at http://www.rcsb.org/pdb/status.html. Users may query all available sequences, or query based on criteria such as title or deposition date.
PDB depositors are given the opportunity to prerelease a sequence in advance of the coordinates. This decision is solely at the discretion of the depositor, who may also choose to hold the sequence until the structure is released.
The prerelease of sequence data will allow users to conduct blind tests of structure prediction and modeling techniques. It could also help prevent unintended duplication of effort in structure determination.
This feature was developed in response to requests made to the PDB.
The expedited availability of sequence information is part of PDB's efforts to enable all areas of science. Questions regarding this new feature can be sent to firstname.lastname@example.org.
The PDB Validation Suite is a set of applications programs that create validation reports about 3D structure data. It is designed to work with files in mmCIF or PDB format.
The beta version of this software can be downloaded in binary form for SGI, SUN, and Linux platforms from http://pdb.rutgers.edu/software/.
This software is used in the Validation step of ADIT (AutoDep Input Tool) at http://pdb.rutgers.edu/adit/ and at the Validation Server at http://pdb.rutgers.edu/validate/.
Reports produced include an Atlas entry, a summary report, and a collection of structural diagnostics including bond distance and angle comparisons, torsion angle comparisons, base morphology comparisons (for nucleic acids), and a molecular graphic images. In addition, reports from PROCHECK1, NUCheck2, and SFCHECK3 are also made available.
Questions, comments, and suggestions should be sent to email@example.com.
1R.A. Laskowski, M.W. McArthur, D.S. Moss , J.M. Thornton (1993): PROCHECK: a program to check the stereochemical quality of protein structures. J. Appl. Cryst. 265, pp. 283-291.
2Z. Feng, J. Westbrook, H.M. Berman (1998): NUCheck. NDB-407 Rutgers University, New Brunswick, NJ.
3A.A. Vaguine, J. Richelle, S.J. Wodak (1999): SFCHECK: a unified set of procedures for evaluating the quality of macromolecular structure-factor data and their agreement with the atomic model. Acta Crystallogr. D55, pp. 191-205.
The PDB home page now has a new look, thanks to feedback from PDB users and Prof. Cherri Pancake of Oregon State University, a usability engineer on sabbatical at SDSC. The redesign improves the PDB's home page as a portal to information for experts and newcomers alike. Links to mirror sites, help topics, and a query tutorial are prominently displayed. Searches by keyword or PDB ID are immediately available. The PDB will continue to improve the design and layout of the PDB Web site.
Issue 9 of the PDB newsletter is now available in .pdf format at http://www.rcsb.org/pdb/general_information/news_publications/newsletters/2001q1/PDBnewsletter2001q1.pdf. Printed copies can be requested by sending your mailing address to firstname.lastname@example.org.
The latest PDB CD-ROM (release #96) is currently being distributed. This release contains the macromolecular structure entries and the experimental data (where available) for the 14,731 structures available as of the March 28, 2001 update of the PDB Web site. Further information is available at http://www.rcsb.org/pdb/general_information/about_pdb/cdrom_distribution.html.
In response to questions about the PDB file format, we have compiled relevant documents at http://www.rcsb.org/pdb/info.html#File_Formats_and_Standards.
The PDB file format is mainly described in the PDB Contents Guide that was released by Brookhaven National Laboratory on December 20, 1996.
Any changes made to the PDB file format by the RCSB have been distributed to the PDB mailing list for discussion and review before they have been incorporated. These notices are archived at this site.
The RCSB has written several documents to accompany the PDB Contents Guide, including a Format FAQ to answer frequently asked questions about the PDB format, and a list of common format errors that have been submitted to the PDB.
Please bookmark this link where we will continue to post information about the PDB file format.
The publication Science Watch has named "The Protein Data Bank" (H.M.Berman, J.Westbrook, Z.Feng, G.Gilliland, T.N.Bhat, H.Weissig, I.N.Shindyalov, P.E.Bourne (2000): Nucleic Acids Research 28, pp. 235-242) as the second-most cited research paper of 2000. ("The Hottest Research of 1999-2000" (2001): Science Watch 12, p. 1).
A new version of the PDB home page has been implemented on the beta Web site. This new design was created in response to suggestions from the user community. It maximizes the utilization of space, minimizes images and scrolling, and is meant to be useful to newcomers and experts alike. Mirror sites are now accessible directly from this page, as are links to a variety of help topics. Most notably, SearchLite keyword search capabilities have been made available directly from the home page. Queries by PDB IDs can also be performed from the same search box as keyword search criteria.
Your comments on this page are greatly appreciated and can be sent to email@example.com.
The 9th issue of the PDB newsletter is now available from the PDB Web site. This document features news items from the previous quarter and summarizes the current states of Data Deposition; Data Uniformity and mmCIF; Data Query, Reporting, Access, and Distribution; and Outreach. A plain text copy is also available from the ftp site. If you would like a printed copy of PDB Newsletter 9, please send your request to firstname.lastname@example.org.
The following new features have been implemented on the PDB Web site and its mirrors:
Molecular Interactive Collaborative Environment (MICE)
The MICE collaborative molecular viewer is now available from the View Structure page for each entry. MICE permits remote users to share a VRML-based view of a molecule via the Internet. This view is referred to as the "molecular scene." One user publishes the scene and any number of users subscribe to the scene. By mutual consent, usually via telephone, any participant can become the publisher. At this time, MICE is supported as a signed applet on Windows machines. Further details can be obtained by visiting the MICE Web site at http://mice.sdsc.edu/ or by clicking on the help link accessible from the View Structure page.
An additional feature has been added to the PDB Web site which allows PDB users to create their own customized tabular reports. By selecting "Create A Tabular Report" from the option scroll bar at the top of the Query Result Browser page, users can choose from a variety of parameters that will be used to generate a report from a result list. For example, choices are available for either entire citations or authors only, or for both citations and structure summaries. Previously available report options remain accessible as well: Cell Dimensions, Primary Citation, Structure Identifier, Sequence, Experimental Technique, Refinement Information, and Data Collection information.
Please send questions or comments on these new features to email@example.com.
"The Structures of Life" is a free booklet geared toward an advanced high school or early college-level audience. It explains how structural biology provides insight into health and disease. The booklet contains a general introduction to proteins, a chapter each on X-ray crystallography and NMR, and a chapter on structure-based drug design. It also features "Student Snapshots" designed to inspire young people to consider careers in biomedical research.
This resource is available online at http://www.nigms.nih.gov/news/science_ed/structlife.pdf.
"The Structures of Life", and other science education booklets, are published by the National Institute of General Medical Sciences, a component of the National Institutes of Health. To order any or all of these free booklets, call (301) 496-7301, send e-mail to firstname.lastname@example.org, or order online at http://www.nigms.nih.gov/news/publist.html.
On October 24, 2000, the PDB asked for feedback regarding the release of sequence data before the coordinate data are released (http://www.rcsb.org/pdb/lists/pdb-l/200010/msg00038.html). This proposal was made in response to requests made to the PDB.
The prerelease of sequence data would allow users to conduct blind tests of structure prediction and modeling techniques. It could also help prevent unintended duplication of effort in structure determination.
The community's reaction to this proposal was favorable. In response, PDB has added an option to the deposition process that gives the depositor the opportunity to prerelease a sequence in advance of the coordinates. This decision is solely at the discretion of the depositor, who may also choose to hold the sequence until the structure is released. The current default is to hold the sequence.
If the author chooses to release a sequence before the structure is released, the sequence information will appear in the beta site status search results (http://beta.rcsb.org/pdb/status.html) after the author approves the processed entry. Users may query all available sequences, or query based on criteria such as title or deposition date, or the availability of a sequence.
The expedited availability of sequence information is part of PDB's efforts to enable all areas of science. Questions regarding this new feature can be sent to email@example.com.
The Second International Structural Genomics Meeting was held at the Airlie Conference Center in Warrenton, Virginia, on April 4-6, 2001. The meeting focused on fostering and formalizing international interactions in structural genomics. Task forces reported on the goals of the efforts and policies on data release and deposition, publication, intellectual property, and cooperation with industry. Further information about the meeting will be made available on the NIGMS Web site.
The PDB is pleased to announce the availability of two new features on its beta Web site:
The MICE collaborative molecular viewer is now available from the View Structure page for each entry. MICE permits remote users to share a VRML-based view of a molecule via the Internet. This view is referred to as the "molecular scene." One user publishes the scene and any number of users subscribe to the scene. By mutual consent, usually via telephone, any participant can become the publisher.
At this time, MICE is supported as a signed applet on Windows machines. Further details can be obtained by visiting the MICE Web site or the help page accessible from the View Structure page.
An additional feature has been added to the beta site which allows PDB users to create their own customized tabular reports. By selecting "Create A Tabular Report" from the option scroll bar at the top of the Query Result Browser page, users can choose from a variety of parameters that will be used to generate a report from a result list. For example, choices are available for either entire citations or authors only, or for both citations and structure summaries. Previously available report options remain accessible as well: Cell Dimensions, Primary Citation, Structure Identifier, Sequence, Experimental Technique, Refinement Information, and Data Collection information.
Questions or comments about these features may be sent to firstname.lastname@example.org.
On Tuesday, February 27, 2001, the Board of Directors of the Object Management Group (OMG) voted to adopt the Common Object Request Broker Architecture (CORBA) Macromolecular Structure Specification. This specification opens the door to more seamless and specific access to PDB data. More specifically, it provides a standard application programming interface (API) that will allow direct access by remote programs to the binary data structures of the PDB. Designed in collaboration with the International Union of Crystallography (IUCr), the new standard is based on the Macromolecular Crystallographic Information File (mmCIF) data representation (Bourne et al. 1997). Unlike current access in which users are required to retrieve and parse complete PDB files, an implementation of this CORBA API will allow applications to retrieve a single data item from a remote PDB server and import it for use in a local application.
CORBA provides a platform and programming language neutral mechanism for specifying distributed object-oriented interfaces. The OMG, which oversees the development of CORBA and several other open standards for object-oriented computing, also charters groups such as the Life Sciences Research (LSR) Task Force for work in specific application domains. In addition to macromolecular structure, the LSR has also defined or is currently working on, interface specifications in areas such as sequence analysis, gene expression and laboratory equipment control. Collectively, these specifications should provide a robust framework for the development and integration of key data resources required by the structural biology community.
This initiative was led by Dr. Douglas Greer of the San Diego Supercomputer Center (SDSC). Dr. Greer is also the chair of the Macromolecular Structure Finalization Task Force (FTF), a newly created entity within the OMG with the charter of making any necessary changes to the specification necessary for implementation. A reference implementation with source code is expected to be publicly available from the PDB in the next year.
P.E.Bourne, H.M.Berman, B. McMahon, K. Watenpaugh, J. Westbrook and P.M.D. Fitzgerald. Methods in Enzymology (1997) 277, 571-590.
The Macromolecular CIF Dictionary
Additional information is available at the following sites:
OMG and LSR: http://www.omg.org/
OMG and LSR: http://www.omg.org/
Thanks to all who visited the PDB exhibit booth at the Pittsburgh Conference (PITTCON) held last week in New Orleans, LA. We appreciate your support, and hope to see you at future conferences!
The PDB Annual Report is now available from the PDB Web site in PDF format at http://www.rcsb.org/pdb/general_information/news_publications/annual_reports/annual_report_year_2000.pdf. This document features a detailed look at the first full year of the RCSB's operation of the PDB from July 1, 1999 through June 30, 2000. It highlights PDB functions, accomplishments during this period, and plans for the coming year. Printed copies can also be obtained by sending your request and postal address to AnnualReport@rcsb.org.
The PDB will participate in the exposition at the Pittsburgh Conference (PITTCON), to be held at the Morial Convention Center in New Orleans, LA, on March 5-8. PDB staff will be available to answer your questions at booth 3410. We hope to see you there!
Thanks to all who visited the PDB exhibit and attended the talk given by Dr. Helen Berman at the Biophysical Society's Annual Meeting in Boston, MA. We appreciate the feedback received from the user community.
The latest PDB CD-ROM (release #95) is currently being distributed. This release contains the macromolecular structure entries and the experimental data when available for the 14,040 structures available as of the December 26, 2000 update of the PDB Web site. With this release, the CD-ROM set has grown to six disks.
To assist MSWindows platform users, additional resources for uncompressing the .gz files are included.
Browse the PDB CD-ROM set documentation at http://www.rcsb.org/pdb/home/cdrom_distribution.html for more information on the CD-ROM contents; on-line ordering information is also available from that site.
The PDB has compiled a variety of structural genomics links as part of the Web site. The purpose of this page is to provide an entry point to additional information on structural genomics relevant to PDB users. Links and other structural genomics-related developments may be provided to the PDB by sending e-mail to email@example.com.
Included on this page is information about the Second International Structural Genomics Meeting, which will be held at the Airlie Conference Center in Warrenton, Virginia, April 4-6, 2001. Information on the meeting and application process can be found on the National Institute of General Medical Sciences' Web page for structural genomics at http://www.nigms.nih.gov/funding/psi.html and on the application page.
The pdb-l list is a forum for PDB users to collaborate and distribute information. While the RCSB maintains this resource for the community and does not moderate the list, the recent dissemination of a computer virus has made it necessary to for us to modify our policy.
Effective February 1, 2001, messages with attachments that are sent to firstname.lastname@example.org will not be accepted. Subscription information and an archive of pdb-l messages is available at http://www.rcsb.org/pdb/education_discussion/discussion_forum/index.html.
The PDB will be an exhibitor at the Biophysical Society's Annual Meeting on February 17-21, in Boston, MA. We hope that you will stop by booth 1204 to say hello and to pick up a PDB temporary tattoo!
Dr. Helen Berman, a 2001 Biophysical Society Fellow, will be giving a talk on the PDB and the Nucleic Acid Database at the Databases for Biophysicists workshop at this meeting (Sunday, February 18, 7:30pm, Ballroom B).
The Second International Structural Genomics Meeting will be held at the Airlie Conference Center in Warrenton, Virginia, April 4-6, 2001. Information on the meeting and application process can be found on the National Institute of General Medical Sciences' Web page for structural genomics at http://www.nigms.nih.gov/funding/psi.html and on the application page.
Space is limited and applications should be completed as soon as possible.
The Protein Data Bank was reviewed in a Web Report in Genome Biology which is available at http://genomebiology.com/2000/1/6/reports/2056/.
There will be an upgrade to the router at the SDSC PDB Web site at http://www.rcsb.org/pdb/ on Thursday, February 1st, after 6:00pm PST time. While we expect there to be minimal downtime, this is a good time for you to bookmark one of the RCSB PDB mirrors listed at http://www.rcsb.org/pdb/general_information/mirror_sites/index.html.
The CIF (Crystallographic Information File) format, a subset of STAR (Self-defining Text Archive and Retrieval format), is suitable for archiving all types of text and numerical data, in any order. CIF's usefulness derives from its generality, upward compatibility, and flexibility. Recently, the RCSB released a set of simple object-oriented Perl modules and scripts for parsing STAR compliant data files and dictionaries, such as mmCIF. Users with a working knowledge of Perl and a basic familiarity of CIF or other STAR-compliant data file formats will benefit from these tools. Modules included in this distribution are:
The included scripts are a mixture of basic utility scripts (e.g. parse.pl or check.pl), and very simplistic examples that are meant to test certain methods in the modules (e.g. create.pl). Users can also write their own customized scripts.
To download these modules or for more information, please refer to the documentation at http://pdb.sdsc.edu/STAR/.
Additional mmCIF resources are available at http://pdb.rutgers.edu/mmcif/.
The 8th issue of the PDB newsletter is now available from the PDB Web site. This document features news items from the previous quarter and summarizes the current states of Data Deposition; Data Uniformity; Data Query, Reporting, Access, and Distribution; and Outreach. A printer-friendly version of this document is available, and a plain text copy is also accessible from the ftp site. If you would like a printed copy of PDB Newsletter 8, please send your request to email@example.com.
The PDB has compiled a list of mmCIF (macromolecular Crystallographic Information File) resources at http://pdb.rutgers.edu/mmcif/. The mmCIF dictionary and a set of dictionary extensions are used by the PDB team for all aspects of data processing. The mmCIF resources page provides links to articles, data dictionaries, format correspondences, the RCSB's response to the OMG RFP for a CORBA API for Macromolecular Structure, and tutorials. This page also provides links to various software programs, including Star (CIF) Parser, the new program developed by the PDB.
As part of Nucleic Acids Research's database issue, the PDB has a paper entitled "The PDB data uniformity project" that describes the data uniformity project that is underway to address the inconsistency in the archive.
Updates on the Data Uniformity Project are available at http://www.rcsb.org/pdb/resources/uniformity/index.html.
T.N. Bhat, P.E. Bourne, Z. Feng, G. Gilliland, S. Jain, V. Ravichandran, B. Schneider, K. Schneider, N. Thanki, H. Weissig, J. Westbrook, H.M. Berman (2001): The PDB data uniformity project. Nucleic Acids Research 29 (1), pp. 214-218.
RCSB PDB (citation) is managed by two members of the Research Collaboratory for Structural Bioinformatics: Rutgers and UCSD/SDSC
RCSB PDB is a member of the
The RCSB PDB is funded by a grant (DBI-1338415) from the
National Science Foundation, the
National Institutes of Health, and the
US Department of Energy.