New PDB Web Tools

  1. SearchLite
  2. ADIT
  3. Validation Server

SearchLite: Version 1.0 of the PDB Web Interface

The RCSB has developed a set of query and structure reporting tools for use with the PDB. These tools will continue to evolve with each release of the RCSB PDB Web site. Two sets of upwardly compatible enhancements to the query capability are planned for release later in 1999.

A user accesses the PDB through the Web by making a query and receiving a result. These actions are shown on the left- and right-hand side of Figure 1, respectively. Each of these two components will evolve with every new release of the Web interface. Two further upwardly compatible enhancements are planned for 1999. This article discusses only version 1.0, which is available now. Future newsletters will discuss new versions as they become available. To simplify the discussion, the query and subsequent result are discussed separately.

Figure 1. PDB Query and Result Overview

Query

Version 1.0 of the query interface is a keyword query: a user supplies one or more keywords, and all structures that contain those keywords are returned. The search for keywords is performed on the contents of the PDB files. To make a more specific query, a user may pose the query to a subset of PDB record types. For example, a search for Jones will find many structures for which Jones is not an author but which, for example, were solved using the map-fitting programs developed by Alwyn Jones. Restricting the search field to just author returns structures for which Jones was an author according to PDB JRNL and REMARK 1 records.

Multiple keywords can be used as part of the same query. A query term of protein kinase will return all references to structures that contain references to the terms "protein" and "kinase." Since "protein" is a general term and appears in nearly all structures, this becomes a search for "kinase" and will return, for example, "histidine kinase," which is not a member of the protein kinase family. Using the search term "protein kinase" (in quotes) requires the keywords to be contiguous, which will return a set of structures closer to those belonging to the protein kinase family. However, it will also return structures that contain the phrase "protein kinase inhibitor," which may not be desired. There is no NOT clause supplied at present to refine this query to exclude the keyword "inhibitor." Results from such queries can be manually trimmed or added to as desired.

Results

Fig. 1 shows two types of results presented for a given query -- a single structure result or a multiple structure result. For example, entering a specific PDB identifier will return a single structure, whereas a phrase such as "protein kinase" will return multiple structures. Each case is discussed separately.

Single Structure

A single structure result will return an Explore page. As the name suggests, the page provides the opportunity to further explore several different aspects of the structure, both from the PDB and elsewhere on the Internet. The Explore page presents summary information about the structure, including the release date, author names, compound name, and primary citation, and a dynamic list of options based upon what information is available for the specific structure. For example, if the structure is a crystal structure of a nucleic acid or nucleic acid complex, a link will appear to the Nucleic Acid Database Atlas page (NDB; Berman et al. (1992) Biophys J. 63(3), 751-9). The NDB Atlas Entry provides further summary information, hand-curated images, and highly curated coordinate files. Similarly, if a previous version or versions of a structure exist, the page links to a review of the chronology of that particular structure. If no previous versions exist, the link will not appear. Other options that appear on the Explore page are described below.

  • View Structure -- Provides still and interactive views of the structure. Still images at different sizes and resolution are produced on the fly with Molscript (Kraulis (1991) J. App. Cryst. 24, 946-950) and Raster-3D (Merritt and Bacon (1997) Methods in Enzymology 277, 505-524). Images are displayed in a standard orientation (right-hand frame x-axis horizontal, y-axis vertical, looking down the z-axis) and use the author-deposited secondary structure assignments to denote secondary structure as ribbons or cylinders. Interactive views use a standard VRML browser and the Molscript VRML output option (Kraulis loc. cit.), Rasmol (R. Sayle and E. Milner-White (1995) TIBS 20(9), 374), or the QuickPDB Java applet (Shindyalov and Bourne, unpublished).
  • Download Coordinates -- Lets users download atomic coordinates in PDB or mmCIF format either as uncompressed ASCII files or compressed with UNIX compress, gzip, or pkzip.
  • Structure Neighbors -- Provides direct access to reports of other structures exhibiting 3-D structure homology to the structure being explored. Access is provided to the databases of the common classification methods: CATH (Orengo, Michie, Jones, Jones, Swindells, and Thornton (1997) Structure 5(8), 1093-1108); CE (Shindyalov and Bourne (1998) Protein Engineering 11(9), 739-747); FSSP (Holm and Sander (1998) Nucl. Acids Res. 26, 316-319); SCOP (Murzin, Brenner, Hubbard, and Chothia (1995) J. Mol. Biol. 247, 536-540) and VAST (Gibrat, Madej and Bryant (1996) Current Opinion in Structural Biology 6, 377-385).
  • Geometry -- Provides a tabular and graphical (requires Rasmol) representation of the stereochemistry of the structure. Both are color-coded to indicate deviation from the standard values reported by Engh and Huber (1991) Acta Cryst. A47, 392-400.
  • Sequence Information -- Reports the size and molecular weight of each chain in the macromolecule as well as the sequence, and in the case of protein chains, the secondary structure assignment according to Kabsch and Sander ((1983) Biopolymers 22(12), 2577-2607).
  • Previous Versions -- Provides a graphical summary of the release and withdrawal dates of previous versions of a structure and a tabular comparison of the different features within each version. Stereochemical comparisons between each version of the structure are also presented.
  • Crystallization Information -- Reports data where available from the Biological Macromolecular Crystallization Database (BMCD; Gilliland et al. (1994) Acta Cryst. D50 408-413). These data includes details of the crystal(s), crystallization conditions, and references.
  • Other Sources -- Provides pointers to other relevant information available via the Web. Again, this is a dynamic list. What appears depends on the structure in question.
Multiple Structures

Queries that return multiple structures are subject to three actions: filtering, downloading, and summarizing. Each is discussed separately.

  • Filtering -- A single structure, a subset of structures, or all structures can be chosen for downloading or summarizing. Alternatively, the selected list of structures can be used as input to a subsequent query, refining the search.
  • Downloading -- The selected list of structures can be downloaded as a set of PDB files or mmCIFs in compressed or uncompressed formats. Sequences only, taken from the PDB SEQRES records, can be downloaded in FASTA format.
  • Summarizing -- Reports on the selected list of structures can be generated. They can be presented as formatted Web pages (HTML) for printing or as tables with delimited fields suitable for loading into a spreadsheet or user-provided program (TEXT). Reports are available for cell constants and space group, primary citation, sequence, experimental technique, and refinement details (where applicable).

Summary

The basic structure for making a query and interpreting results described here will form the basis of more powerful query capabilities in the future. Examples of such queries will be reported in future issues.

Acknowledgements

The RCSB is grateful to Drs. Steven Brenner and Paula Fitzgerald for beta testing v1.0 of SearchLite.

ADIT: AutoDep Input Tool

A new deposition software tool, called the AutoDep Input Tool (ADIT), has been developed by the RCSB. This tool will be made available at the RCSB Web site for testing during the initial part of the transition period and subsequently made available as an alternative to the existing AutoDep system.

ADIT is a Web-based deposition tool that builds a collection of HTML forms. Each form presents data items selected from a single mmCIF category. The scope of all possible data items available to ADIT is determined by the content of an underlying mmCIF data dictionary. Because this dictionary contains more than 1,600 definitions, the most important function of the view is to present to the ADIT user those data items that are relevant to deposition.

ADIT is undergoing beta testing by members of the crystallographic and NMR spectrographic communities. The test period will extend until the system has been found to be robust by depositors and the archive. We expect this test period to last two to four months. Following testing, ADIT will be made available for general use. ADIT, used with a specialized view for annotation, is used by the RCSB to process structures deposited to the PDB.

ADIT: Validation Server

The RCSB has released its validation server (http://pdb.rutgers.edu/validate/) at the RCSB Web site. This validation server can be used to check the format consistency of the coordinates and perform a validation pre-check of the structural features and structure factors before the structure is deposited. The validation server will accept coordinate and structure factor data and produces a report of geometrical and experimental checks. The content and presentation of the validation report is the same as the report produced during the deposition process and can be used by the depositor prior to deposition or at any time during a structure refinement.

Tutorials are available online.