Two Workshops on Structural Genomics

Argonne National Laboratory
by Andrzej Joachimiak and John Westbrook

The PDB actively participated in two workshops held at Argonne National Laboratory in November. These workshops brought together researchers who are actively working in the fields of bioinformatics, high throughput cloning, expression and purification, protein crystallization, protein structure determination and validation, protein structure prediction, data bases - important fields in a "pipeline" that proceed from gene sequence to protein structure. In the first workshop entitled, "High Throughput Methods for Structural Genomics", Helen Berman presented the PDB plans for enabling structural genomics.

The second workshop: "Rapid Structure Determination at 3rd Generation Synchrotron Sources" was intended to experimentally demonstrate the unique capabilities at 3rd generation sources through a hands-on data collection, analysis and structure determination. The experimental part was complemented by short seminars and discussion with experts. John Westbrook gave a practical demonstration of the PDB plans for enabling structural genomics. Under an expert guide, attendees worked as teams to collect and process diffraction data, learn how to analyze and evaluate data, solve the structure using MAD analysis, automatically build a model, and refine the structure. Several alternative crystallographic software packages were used at the beamline for data processing, analysis and structure determination. High-throughput methods were emphasized. MAD data were collected on four SeMet labeled proteins ranging from simple (16 kDa) to challenging (450 kDa) problems. All four structures were solved from the data collected during the three-day workshop.

The purpose of the demonstration was also to illustrate how data pipelining — the incremental collection of data required for PDB deposition — could be used to automate virtually all aspects of the PDB deposition. Although the demonstration was focused on collecting information for deposition, the same techniques can be applied to collect and encode experimental details that need to be captured during high throughput experiments.

Prior to the workshop, we worked with Wladek Minor and Zbyszek Otwinowski to develop an mmCIF extension dictionary to capture key data items from the data processing program HKL2000. This allowed us to collect information during data processing that could be pipelined directly into the deposition process. In prior work with the developers of X-PLOR and CNS, we had developed a macro to export the details of refinement in mmCIF format.

During the workshop, we demonstrated the full cycle of data collection, processing, refinement, and deposition at PDB for two protein systems — peptide binding domain of thermophilic chaperonin and aldose reductase. The MAD data sets were processed using HKL2000. MAD phasing was accomplished with SOLVE, CNS and SHARP, autotracing with ARP/wARP. Refinement was performed using CNS. Details of each step in the structure determination were electronically captured and integrated with the coordinates of the deposited structure.

The success of this demonstration and the lessons that were learned will enable us to further improve our systems so that PDB deposition will become fully integrated into high throughput structure determination.