New PDB Web Tools
The RCSB has developed a set of query and structure reporting tools for use with the PDB. These tools will continue to evolve with each release of the RCSB PDB Web site. Two sets of upwardly compatible enhancements to the query capability are planned for release later in 1999.
A user accesses the PDB through the Web by making a query and receiving a result. These actions are shown on the left- and right-hand side of Figure 1, respectively. Each of these two components will evolve with every new release of the Web interface. Two further upwardly compatible enhancements are planned for 1999. This article discusses only version 1.0, which is available now. Future newsletters will discuss new versions as they become available. To simplify the discussion, the query and subsequent result are discussed separately.
Figure 1. PDB Query and Result Overview
Version 1.0 of the query interface is a keyword query: a user supplies one or more keywords, and all structures that contain those keywords are returned. The search for keywords is performed on the contents of the PDB files. To make a more specific query, a user may pose the query to a subset of PDB record types. For example, a search for Jones will find many structures for which Jones is not an author but which, for example, were solved using the map-fitting programs developed by Alwyn Jones. Restricting the search field to just author returns structures for which Jones was an author according to PDB JRNL and REMARK 1 records.
Multiple keywords can be used as part of the same query. A query term of protein kinase will return all references to structures that contain references to the terms "protein" and "kinase." Since "protein" is a general term and appears in nearly all structures, this becomes a search for "kinase" and will return, for example, "histidine kinase," which is not a member of the protein kinase family. Using the search term "protein kinase" (in quotes) requires the keywords to be contiguous, which will return a set of structures closer to those belonging to the protein kinase family. However, it will also return structures that contain the phrase "protein kinase inhibitor," which may not be desired. There is no NOT clause supplied at present to refine this query to exclude the keyword "inhibitor." Results from such queries can be manually trimmed or added to as desired.
Fig. 1 shows two types of results presented for a given query -- a single structure result or a multiple structure result. For example, entering a specific PDB identifier will return a single structure, whereas a phrase such as "protein kinase" will return multiple structures. Each case is discussed separately.
A single structure result will return an Explore page. As the name suggests, the page provides the opportunity to further explore several different aspects of the structure, both from the PDB and elsewhere on the Internet. The Explore page presents summary information about the structure, including the release date, author names, compound name, and primary citation, and a dynamic list of options based upon what information is available for the specific structure. For example, if the structure is a crystal structure of a nucleic acid or nucleic acid complex, a link will appear to the Nucleic Acid Database Atlas page (NDB; Berman et al. (1992) Biophys J. 63(3), 751-9). The NDB Atlas Entry provides further summary information, hand-curated images, and highly curated coordinate files. Similarly, if a previous version or versions of a structure exist, the page links to a review of the chronology of that particular structure. If no previous versions exist, the link will not appear. Other options that appear on the Explore page are described below.
Queries that return multiple structures are subject to three actions: filtering, downloading, and summarizing. Each is discussed separately.
The basic structure for making a query and interpreting results described here will form the basis of more powerful query capabilities in the future. Examples of such queries will be reported in future issues.
The RCSB is grateful to Drs. Steven Brenner and Paula Fitzgerald for beta testing v1.0 of SearchLite.
A new deposition software tool, called the AutoDep Input Tool (ADIT), has been developed by the RCSB. This tool will be made available at the RCSB Web site for testing during the initial part of the transition period and subsequently made available as an alternative to the existing AutoDep system.
ADIT is a Web-based deposition tool that builds a collection of HTML forms. Each form presents data items selected from a single mmCIF category. The scope of all possible data items available to ADIT is determined by the content of an underlying mmCIF data dictionary. Because this dictionary contains more than 1,600 definitions, the most important function of the view is to present to the ADIT user those data items that are relevant to deposition.
ADIT is undergoing beta testing by members of the crystallographic and NMR spectrographic communities. The test period will extend until the system has been found to be robust by depositors and the archive. We expect this test period to last two to four months. Following testing, ADIT will be made available for general use. ADIT, used with a specialized view for annotation, is used by the RCSB to process structures deposited to the PDB.
The RCSB has released its validation server (http://pdb.rutgers.edu/validate/) at the RCSB Web site. This validation server can be used to check the format consistency of the coordinates and perform a validation pre-check of the structural features and structure factors before the structure is deposited. The validation server will accept coordinate and structure factor data and produces a report of geometrical and experimental checks. The content and presentation of the validation report is the same as the report produced during the deposition process and can be used by the depositor prior to deposition or at any time during a structure refinement.
Tutorials are available online.