Transition Plan
PLAN FOR THE TRANSITION OF THE PROTEIN DATA BANK (PDB) FROM BROOKHAVEN NATIONAL LABORATORY (BNL) TO THE RESEARCH COLLABORATORY FOR STRUCTURAL BIOINFORMATICS (RCSB)
Version 1.0
December 1, 1998
Prepared by the RCSB Project Team
An important goal for the coming year is to make the transition to the future PDB transparent. This detailed transition plan has being prepared by the RCSB at the request of the National Science Foundation (NSF). As changes approach, depositors and users will be notified in a variety of ways, including announcements on these pages. New components of the system will be introduced on a regular basis throughout the coming year.
We are looking forward to working with you in the years to come.
This document outlines the plans that have been made to effect the transition of management of the Protein Data Bank (PDB) from Brookhaven National Laboratory (BNL) to the Research Collaboratory for Structural Bioinformatics (RCSB). We describe the steps necessary for moving data processing and distribution from BNL to RCSB. The members of RCSB Project Team wrote the document after an initial meeting with key personnel at BNL and representatives of the funding agencies including the National Science Foundation, the National Institutes of Health and the Department of Energy. Further reviews by all participants from RCSB, BNL and the funding agencies have resulted in this document.
This document should be considered a work in progress and may indeed change as the transition year progresses. We welcome suggestions and questions from the community; these should be directed to info@rcsb.org. Our goal is to make the transition as easy as possible for all concerned as we move into a new era of the PDB.
The RCSB Project Team
Helen M. Berman, John Westbrook, Rutgers University
Gary Gilliland, Phoebe Fagan, National Institute of Standards and Technology
Peter Arzberger, Phil Bourne, University of California, San Diego
The Research Collaboratory for Structural Bioinformatics (RCSB) will assume the responsibility for the management of the Protein Data Bank (PDB) from Brookhaven National Laboratory (BNL). RCSB and BNL are committed to make this transfer of function seamless to the community. The transition period began October 1, 1998 and will conclude October 31, 1999. Announcements of the plan and on-going progress will be made at two Internet Web sites: http://www.pdb.bnl.gov/ and http://www.rcsb.org/. The community will be given two months notice of any changes in the system.
The RCSB will assume the responsibilities for Deposition and Data Processing (Section 1.0) during the transition period. Each aspect of data processing will be tested. In the first phase, the RCSB will begin test processing of Layer 1 entries that have been produced by AUTODEP. On January 27, 1999, the RCSB will take over full responsibility for managing the archive and will become responsible for the processing of all entries received by the PDB via AUTODEP or any other means from that date forward. BNL will be responsible for the completion of the processing of all files received up to January 27, 1999. In January 1999, RCSB will begin beta tests of an alternative input tool AutoDep Input Tool (ADIT), which will become an eventual replacement for AUTODEP. After this tool has been found to be robust, it will be made available for general use. AUTODEP will continue to be supported by BNL throughout the transition period.
Concurrent with assuming the responsibilities for the archive, the RCSB will become the primary distribution point for Data Query and Distribution (Section 2.0). Beginning in November 1998, the RCSB will mirror the current BNL ftp site. In February 1999, the RCSB will become the primary ftp site and BNL will mirror this site. The BNL 3DB Browser and associated tools will continue to be available throughout the transition and will be supported by BNL. The RCSB query engines will be made available in three phases as described below. The full search capabilities described in the grant proposal will be available by October 1999.
Starting in December 1998, all mirror sites, sites currently establishing mirrors, class I PDBSA license holders, class II PDBSA license holders, PDB listserver subscribers, and PDB newsletter subscribers will be given appropriate notification of any changes. New arrangements will be made as needed. BNL will continue to provide CD-ROMs during the transition period and the first RCSB CD-ROM will be released in October 1999.
The overall timeline for the transition is given in Figure 1.
1.0 DEPOSITION AND DATA PROCESSING
In this section, the specific transition plan for deposition and data processing is described. This plan describes the yearlong transition during which all deposition and primary data processing activities of the current BNL-PDB will be assumed by the RCSB.
In order to provide the greatest possible continuity to the depositing community, the existing AUTODEP PDB deposition system provided by BNL will be maintained in its current state until October 31, 1999. New deposition software tools (ADIT) will be provided by RCSB for testing during the initial part of the transition, and subsequently made available as an alternative to the existing AUTODEP system. Concurrent operation of both new and old deposition systems will be supported during the latter half of the transition, after which support for the current AUTODEP system will be discontinued. The transition between existing and new deposition tools will be coordinated with the European Bioinformatics Institute (EBI) and BioMagResBank (BMRB).
Starting January 27, 1999, files deposited at BNL or EBI will be transferred to Rutgers where these entries will be standardized and checked prior to public release. The entries will be released by the RCSB as soon as their processing is complete (within two weeks) unless the HOLD status is specified.
In summary, the key features of the deposition and data processing plan include:
- Gradual phasing out of AUTODEP depositions.
- Staged deployment of new RCSB deposition tools for X-ray and NMR data.
- Processing of all entries at the Rutgers site starting January 27, 1999, and hence the cessation of the release of Layer 1 entries after January 27, 1999.
- Completion of final processing of all structures deposited before January 1999 by BNL. Entries "on hold" until dates after October 1999 will be put into final, ready-to-release form by BNL.
- Coordination of data deposition activities with sites at EBI and BMRB.
Data Processing:
- Accept Layer 1 AUTODEP submissions from BNL and EBI sites starting January 27, 1999. Complete the data processing and annotation of entries prior to release. Completed files will be released within two weeks. Completed entries that are not on HOLD will be distributed by RCSB in PDB V2 format via ftp and the Web. CD-ROMs containing PDB V2 formatted files will be distributed by BNL. Files in the macromolecular Crystallographic Information File (mmCIF) format will be distributed by the RCSB via ftp and the Web.
Data Deposition:
- New deposition software for both X-ray and NMR data will be made available for testing. This tool is called ADIT (AUTODEP Input Tool). Laboratories interested in participating in the test program will be encouraged to use the new software and provide feedback during this beta test period, which will begin on January 27, 1999. The test period will extend until the system has been found to be robust by depositors and the archive. We expect this test period to last between two and four months. Following testing, ADIT will be made available for general use.
- Web accessible validation servers will be installed at each RCSB site. The validation server will accept coordinate and structure factor data and produce a report of geometrical and experimental checks. The content and presentation of the validation report is the same as the report that will be produced during the deposition process.
Access and Documentation:
- Provide a tape copy of all on-line files systems associated with data deposition or data processing (received October 22, 1998).
- Provide for ongoing routine backup and archiving operations during the transition (under discussion).
- Provide initial statistics for structures in preparation, in processing, on hold or in queue (received October 15, 1998).
- Provide statistics on an on-going weekly basis of new structures deposited via AUTODEP (access to system provided October 19, 1998).
- Provide RCSB with documentation and technical descriptions for existing data processing procedures. This should include directions for finding all of the information about any deposition (data files, AUTODEP-generated files, depositor correspondence, manuscripts or preprints, corrections or revisions, or annotator notes; received October 29, 1998).
- Provide RCSB with access to AUTODEP software and any auxiliary software supporting data deposition, data processing, revision control, and archiving operations (received October 19, 1998).
- Provide RCSB with documentation for AUTODEP (received October 29, 1998).
- Provide RCSB with access to all data and software used for data deposition and processing operations (received October 22, 1998).
Data Deposition and Data Processing:
- Minimally modify existing AUTODEP deposition and data processing procedures at PDB to forward deposited data directly to the Rutgers site starting January 27, 1999 (Discussions are currently in progress).
- Continue to maintain AUTODEP throughout the transition period. No changes other than the correction of coding errors will be made. These changes will be reported to RCSB.
- New logins toAUTODEP will be turned off 45-60 days prior to October 1999.
- Corrections to final released entries will be forwarded by BNL to RCSB.
- For those structures in the queue, a monthly timetable for completing the processing of these entries during the transition period will be defined. A monthly progress report of the backlog structures processed will be provided. Processed structures and associated materials will be transferred to the Rutgers site for release.
Once the transition plan has been approved, the RCSB and BNL will announce the implementation schedule to the depositor community via both Web sites. On 11/1/98, RCSB deployed new deposition software for alpha testing. Validation servers will be deployed at each RCSB site on January 27, 1999. Beta testing of ADIT will begin at this time and will continue until a satisfactory level of processing is achieved. At the end of the transition period, support for AUTODEP will cease at BNL and all aspects of data processing will be transferred to the RCSB. If RCSB determines that there is an interaction between older entries being processed by BNL and current entries being processed by RCSB and it is not possible to resolve these interactions, then RCSB will take responsibility for processing of these older entries.
October 1998
- Gather information about all PDB data and software currently at BNL.
- Clarify data representation issues with EBI, BNL and BMRB for fields in current PDB files (in progress).
- Run annotator workshop for staff at Rutgers and NIST (Held October 15-16, 1998).
- Establish clear procedures for automatically forwarding data files from BNL to Rutgers (in progress).
November 1998
- Test processing of layer 1 for selected X-ray and NMR files deposited at BNL starting November 1, 1998. EBI and the National Center for Biotechnology Information (NCBI) will review the final files generated by the RCSB (in progress).
- Do alpha test of ADIT for X-ray and NMR data within RCSB group (in progress).
- Track data flow of incoming data to BNL required to establish workload and time schedule for RCSB assumption of BNL responsibilities (in progress).
- Recruit beta testers of ADIT from members of the international NMR and X-ray communities (in progress).
- Develop NMR Task Force guidelines and objectives. Define the combined requirements for a single RCSB/BMRB phase II ADIT for NMR (in progress).
December 1998
- Make public announcement of the transition schedule.
- Establish expected workload and time schedule for processing structures in queue by BNL and new processing by RCSB. This will be based on statistics provided by BNL and by observation of the data flow by RCSB.
- Continue data processing of layer 1 depositions by RCSB.
- Complete draft of protocol for data exchange with EBI.
- Extend current ADIT for NMR by extending the underlying NMR dictionary. Draft protocol for exchange of data with BMRB.
January 1999
- Beta testing of ADIT will begin.
- RCSB will begin discussions with new deposition centers in Japan and Israel.
- BNL will automatically forward all incoming data files to RCSB. RCSB will process all incoming files.
- RCSB will make validation server available for public use at all RCSB sites.
- RCSB will meet with EBI staff to review the draft protocol for data exchange.
- BNL will continue to work on all files in the queue as of January 27, 1999 with the goal of completing data processing of all these files by October 1999.
February October 1999
- RCSB will be responsible for all data processing with input coming from AUTODEP, ADIT and ASCII files.
- ADIT will be in final stages of testing and tuning. It will replace AUTODEP at some point during the transition period but no later than October 1999.
- BNL will continue to process all files in the queue as of January 27, 1999 and will continually forward these files to RCSB for distribution.
The major component of the transition involves changing the primary distribution site from BNL to UCSD. This change will occur on February 3, 1999 when RCSB takes over the primary responsibility for data processing.
The transition year will see a three-phase rollout of the RCSB query capability with the focus being on rigorous testing and gradual introduction of the full query capability. During the transition it is expected that BNL will maintain their query capabilities (ftp archive, 3DB, PDB Lite) at the current level.
The RCSB, in collaboration with the NSF, will define a new policy for licensing of PDB data that will take effect in February 1999. All mirror sites, sites currently establishing mirrors, class I PDBSA license holders, and class II PDBSA license holders will be notified of the these policy changes in December 1998. This will be broadly disseminated via the PDB listserver and via individual letters and email.
All PDB listserver subscribers and all newsletter subscribers will be given appropriate notification of all changes to the PDB within the transition period.BNL will also continue to provide CD-ROMs on a quarterly basis. The last CD-ROM produced by BNL will be the July 1999 CD-ROM that will be released by BNL before October 1999.
Paper materials and other off-line media currently stored at the PDB will be inventoried and relocated to the NIST site. Preparations for this relocation from BNL to NIST will be made early in the transition. These materials will be stored and managed at NIST in a fashion consistent with long-term preservation.
Tape backups for archival and disaster recovery applications will be created and stored at the NIST site. In addition to routine daily tape backup operations, quarterly archival snapshots of all RCSB facilities will be created and preserved.
In summary:
- The RCSB will become the primary distribution site for PDB data on February 3, 1999.
- The current BNL ftp structure will be maintained.
- A three-phase rollout of query capability accessible via the Web will be introduced.
- An RCSB ftp archive will be introduced in November 1998.
- New licensing agreements will be established with all class I and class II PDBSA contract holders.
- New RCSBmirrors will be established.
- The current modes of information dissemination (listserver, newsletter, Web documentation, and CD-ROM) will gradually shift from BNL to RCSB.
- BNL and the RCSB using the modes described above will report planned changes to service.
- Establish with the NSF the form and wording of the licensing agreements to be enacted by the RCSB.
- Notify the community of all impending changes well in advance of implementation.
- Accept data from Rutgers for public distribution as of January 27, 1999.
- Update new RCSB mirrors and other mirrors on weekly basis and as they come on-line.
- BNL will act as a mirror site as the data flow changes direction.
- Relocate the paper and the off-line media components of the PDB archive to NIST site.
- Start RCSB CD-ROM distribution in October 1999.
- Maintain periodic routine tape backups of all RCSB on-line facilities.
- Preserve quarterly tape backups of all RCSB on-line facilities.
- Continue to be responsible for BNL style mirrors that are currently in place until new contracts have been established and RCSB mirrors fully established.
- Provideall software and documentation associated with mirroring, 3DB browser, and PDB Lite (received October 22, 1998).
- Provide periodic snapshots of the complete PDB system (first received October 22, 1998).
- Provide contact information and state of development of mirrors in progress and any other mirrors not defined in the last PDB Newsletter (received October 16, 1998).
- Provide copies of all current PDBSA agreements, both class I and class II (received October 16, 1998).
- Provide current CD-ROM subscription list (complete demographic information) and notification of anyone requesting a subscription during the transition period (received October 16, 1998).
- Provide current list of subscribers to the PDB listserver and notification of new subscribers during the transition period (received October 16, 1998).
- Provide current list of subscribers to the PDB Newsletter both electronic (names and email addresses) and postal (full demographic information) (received October 16, 1998).
- Provide copies of all advisory notices ever signed (partial list for previous three years received October 22, 1998).
- Provide estimates of the type and organization of paper and off-line media in the current PDB archive (due October 29, 1998).
The query interface will be introduced in three phases. Phase I will be a free text search capability called SearchLite. Phase II will introduce a query interface which includes query by structure type and sequence. Phase III will be the fully customizable query interface described in the proposal. A schedule of testing and feedback accompanies each of these phases of release. A mirror of the ftp archive has already been made and this will be made public in December 1998. This will enable the many Internet sites worldwide to slowly change their pointers to the new resource. This will be an ongoing activity.
Mirrors of the UCSD RCSB site are already established at Rutgers and NIST. A resource document will be distributed to existing and new mirrors describing resources needed to run an RCSB-style mirror. RCSB mirrors will be of two types: basic ftp archive comprising all data files; and complete system including all distribution and production databases and software. A letter will be sent describing future strategies with respect to mirroring.
New license agreements for 1999 will be distributed to all existing and new members in December 1998. The agreement will allow members to get their data and software from either BNL or RCSB until October 1999. Thereafter all data and software under license will be obtained from RCSB.
RCSB will work with each existing license holder to provide a smooth transition to the new source of data and software. Until the change over to RCSB data and software BNL will continue to provide the current level of service.
BNL will distribute their last Newsletter in October 1998. The first RCSB newsletter will appear in January 1999 to correspond with the change in primary data flow to RCSB. This change will be fully documented in the newsletter and elsewhere. Details of all changes to data query and distribution will be reported in advance from both BNL and RCSB Web sites.
October 1998
- Gather information as specified above from BNL (received).
- Invite members of the community as participants in a workshop to be held December 12, 1998 to discuss the development of the database resource (done).
- Recruit beta testers of the query interface from various niches of the user community (in progress).
- Implement mirrors at NIST and Rutgers (done).
- Obtain CD-ROM subscription list from BNL (received).
- Send tape copies of the current PDB site to the NIST site.
November 1998
- Resolve licensing issues with NSF (in progress).
- Implement ftp site as ftp.rcsb.org (completed).
- Begin alpha testing of Phase I SearchLite (in progress).
- Draft mirror resource document.
December 1998
- Notify PDBSA class I and II license holders of impending changes and begin new license agreements (to be effective February 1, 1999).
- Notify all mirror sites of potential changes from BNL to RCSB style mirrors.
- Hold Database Resource Development Workshop on December 12, 1998.
- Begin Phase I SearchLite beta testing.
- Complete mirror resource document.
- Test archival tape backup operations at the NIST site.
January 1999
- Turn off gopher site at BNL.
- Release Phase I SearchLite.
- Begin Phase II alpha testing.
- Publish and distribute first RCSB newsletter.
- Notify all current PDB listserver subscribers of the new data flow and invite them to join the RCSB listserver.
- Obtain estimates of contents of archived paper and off-line computer media.
- Begin archival tape backup of the production RCSB site and of the Rutgers data and correspondence archive.
- Make plan for the installation of non-RCSB mirrors.
February 1999
- Begin Phase II beta testing
- On-site inspection of archived hardcopy materials at PDB.
March 1999
- Release Phase II query interface.
- Begin Phase III internal testing.
- Begin site preparation for archive materials at NIST.
April 1999
- Publish plans for January 2000 CD-ROM distribution.
- First time stamped backup of RCSB preserved at NIST. Backups are preserved for each subsequent quarter.
July 1999
- Create a test CD-ROM distribution for evaluation.
September 1999
- Complete site preparations for archive at NIST.
- Do on-site preparation at BNL and organization of paper and off-line computer media and ship these materials to NIST.
- Create CD-ROM masters for January 2000 release.
- Tape copies of the PDB site at the end of the transition stored at the NIST site.