Phase II of PDB Query

RCSB plans call for three releases of the PDB Web query interface in the first year of RCSB operation, that is through October 1, 1999. Phase I provided a simple text search interface called SearchLite, as described in the previous newsletter While valuable, since it covers all possible search terms contained in a PDB file, the SearchLite interface does not address questions like "what structure(s) has author brown had a hand in solving since 1990?" Phase II provides this level of query capability through the "Comprehensive Query" option. Comprehensive query enables you to customize the query form to include the query terms that you wish to access. For the above query, you would use an "Author" and "Deposition Date After" field. Explicit query fields are available for:

  • General information (e.g., PDB HEADER fields, authors, deposition date)
  • Crystallographic information (e.g., resolution, space group, all cell parameters)
  • Experimental (refinement)
  • Sequence (complete using FASTA and short string)
  • Features (molecular weight, secondary structure content, EC number)

Part of the RCSB mandate is to better annotate structures, both those already in the database and those being deposited. The first step in this direction has been a structure classification into enzyme, protein-containing, DNA-containing, RNA-containing, carbohydrate-containing, and glycoprotein-containing entries. This classification can be included in a Phase II query.

Another feature of the Phase II query capability is the ability to determine the status of an entry. Since structures are generally processed within two weeks of receipt, the need for processing status is limited. What remains important is finding structures that are on hold and determining when they will become available. A separate query status page permits the user to search for entries on hold by name, PDB id, or release date.

Phase II query will be available from May 15, 1999.

Phase III query, due for release later in the year, will extend these Phase II features and include the ability to query non-redundant sets of data based upon both sequence and similar fold.