Release of Cleaned-up Citation Data and Ongoing Uniformity Work

As part of the data clean-up project led by the NIST-PDB team, the citation data previously introduced on the PDB beta site are now available on the main production sites. This follows the introduction last fall of reliable access to R-factor and resolution data. Extensive work has also been completed on ligand and source data.

All primary citations for PDB entries, as of July 1999, have been validated and corrected, if necessary. This work involved verifying all primary citation data values (title, authors, journal, year, volume, pages) with the published literature using either electronic or hardcopy journal resources. The procedure also involved presenting the citation data values in a uniform format. Whenever possible, links have been added to PUBMED. At present, the primary citations are more than 95% complete.

Legacy PDB data files commonly stored R-factor and resolution data in free text format. This way of storing data, together with the changes in conventions and definitions that took place during the last several years, made it hard to establish fast and accurate queries over these data. The R-factor and resolution information for all legacy data have been examined and tabulated. In more than 5% of the files, this work required referring to the original publication. The tabulated data are now used to improve the reliability of user queries.

Legacy data hold ligand and hetero atom information in a non-uniform format. Rutgers' and NIST's PDB teams annotated the data and developed uniform names. To facilitate reliable searching, synonym lists were generated from the information in the PDB files and, in several cases, context based commercial and popular names from publications and reviews were also added. Queries using this information are now available for beta testing.

The source information for all the legacy data is annotated to follow the conventions and standards used by MEDLINE. These data are currently being implemented for beta testing.

Updating and annotating data is an ongoing process, and the RCSB greatly appreciates input from the user community. Users may send corrections to