Looking at Structures: Dealing with Coordinates
The primary information stored in the PDB archive consists of coordinate files for biological molecules. These files list the atoms in each protein, and their 3D location in space. These files are available in several formats (PDB, mmCIF, XML). A typical PDB formatted file includes a large "header" section of text that summarizes the protein, citation information, and the details of the structure solution, followed by the sequence and a long list of the atoms and their coordinates. The archive also contains the experimental observations that are used to determine these atomic coordinates.
When you start exploring the structures in the PDB archive, you will need to know a few things about coordinate files. Major topics are included here.
ATOMs and HETATMs
A typical PDB format file will contain atomic coordinates for a diverse collection of proteins, small molecules, ions and water. Each atom is entered as a line of information that starts with a keyword: either ATOM or HETATM. By tradition, the ATOM keyword is used to identify proteins or nucleic acid atoms, and keyword HETATM is used to identify atoms in small molecules. Following this keyword, there is a list of information about the atom, including its name, its number in the file, the name and number of the residue it belongs to, one letter to specify the chain (in oligomeric proteins), its x, y, and z coordinates, and an occupancy and temperature factor (described in more detail below).
This information gives you a lot of control when exploring the structure. For instance, most molecular graphics programs enable you to color identified portions of the molecule selectively--for example, to pick out all of the carbon atoms and color them green, or to pick one particular amino acid and highlight it.
The left image shows myoglobin (PDB entry 1mbo) using the default representation in MBT Protein Workshop. It shows a ribbon diagram for the protein, and ball-and-stick for the small molecules. In the right image, we have changed the representation to show all atoms, using the information in each atom record to color the molecules differently. This clearly shows the heme group in bright red, and a bound oxygen molecule in turquoise.
Tip: By default, many molecular graphics programs do not display the water positions in a PDB file, even though they are often important to the function and interaction of biological molecules. Most of these programs have a way to display them, if you use their methods for atom selection.
Chains and Models
Biological molecules are hierarchical, building from atoms to residues to chains to assemblies. Coordinate files contain ways to organize and specify molecules at all of these levels. As described above, the atom names and residue information are included in each atom record. The higher-order information is identified by keywords that separate blocks of atom records, such as TER and MODEL.
Protein and nucleic acid chains are specified by the TER keyword, as well as a one-letter designation in the coordinate records. The chains are included one after another in the file, separated by a TER record to indicate that the chains are not physically connected to each other. Most molecular graphics programs look for this TER record so that they don't draw a bond to connect different chains.
PDB format files use the MODEL keyword to indicate multiple molecules in a single file. This was initially created to archive coordinate sets that include several different models of the same structure, like the structural ensembles obtained in NMR analysis. When you view these files, you will see dozens of similar molecules all superimposed. The MODEL keyword is now also used in biological assembly files to separate the many symmetrical copies of the molecule that are generated from the asymmetric unit (For more information, see the tutorial on biological assemblies).
Two useful coloring schemes allow you to explore the different chains in any given PDB file. First, you may color each chain differently to show the packing of different chains in the molecule as shown in the bottom image. Then, you can color each chain using a rainbow of colors from one end of the chain to the other to highlight its folding characteristics as shown at the top. Both of these methods are available in most molecular graphics programs. The molecule shown here is hemolysin from PDB entry 7ahl.
If we were able to hold an atom rigidly fixed in one place, we could observe its distribution of electrons in an ideal situation. The image would be dense towards the center with the density falling off further from the nucleus. When you look at experimental electron density distributions, however, the electrons usually have a wider distribution than this ideal. This may be due to vibration of the atoms, or differences between the many different molecules in the crystal lattice. The observed electron density will include an average of all these small motions, yielding a slightly smeared image of the molecule.
These motions, and the resultant smearing of the electron density, are incorporated into the atomic model by a B-value or temperature factor. The amount of smearing is proportional to the magnitude of the B-value. Values under 10 create a model of the atom that is very sharp, indicating that the atom is not moving much and is in the same position in all of the molecules in the crystal. Values greater than 50 or so indicate that the atom is moving so much that it can barely been seen. This is often the case for atoms at the surface of proteins, where long sidechains are free to wag in the surrounding water.
The example shown is from a myoglobin structure solved at a 2.0 Å resolution (PDB entry 1mbi). Two histidine amino acids are shown. On the left is HIS93, which coordinates with the iron atom and thus, is held firmly in place. It has B-values in the range of 15-20 -- notice how the contours nicely surround the whole amino acid, revealing a sharp electron density. On the right is HIS81, which is exposed on the surface of the protein and has higher B-values in the range of 22-74. Notice how the contours enclose a smaller space, showing a smaller region with high electron density for this amino acid because the overall electron density is weakly smeared in the space around the contours. These pictures are created using the Astex viewer, which is available on the Structure Summary page for this PDB entry (just click the "EDS" link in the "Experimental Method" section).
The picture shows the whole molecule, with the atoms colored by the temperature factors. High values, indicating lots
of motion, are in red and yellow, and low values are in blue. Notice that the interior of the protein has low B-values
and the amino acids on the surface have higher values.
You can click on the picture for an interactive Jmol view.
Tip: Temperature factors are a measure of our confidence in the location of each atom. If you find an atom on the surface of a protein with a high temperature factor, keep in mind that this atom is probably moving a lot, and that the coordinates specified in the PDB file are only one possible snapshot of its location.
Occupancy and Multiple Conformations
Macromolecular crystals are composed of many individual molecules packed into a symmetrical arrangement. In some crystals, there are slight differences between each of these molecules. For instance, a sidechain on the surface may wag back and forth between several conformations, or a substrate may bind in two orientations in an active site, or a metal ion may be bound to only a few of the molecules. When researchers build the atomic model of these portions, they can use the occupancy to estimate the amount of each conformation that is observed in the crystal. For most atoms, the occupancy is given a value of 1, indicating that the atom is found in all of the molecules in the same place in the crystal. However, if a metal ion binds to only half of the molecules in the crystal, the researcher will see a weak image of the ion in the electron density map, and can assign an occupancy of 0.5 in the PDB structure file for this atom. Occupancies are also commonly used to identify sidechains or ligands that are observed in multiple conformations. The occupancy value is used to indicate the fraction of molecules that have each of the conformations. Two (or more) atom records are included for each atom, with occupancies like 0.5 and 0.5, or 0.4 and 0.6, or other fractional occupancies that sum to a total of 1.
The two images shown are taken from the high-resolution structure of myoglobin in entry 1a6m: glutamine 8 is on the left, and tyrosine 151 on the right. In both cases, the depositors interpreted the experimental data as showing two conformations of the amino acid, with occupancies of 0.57 and 0.43 for the glutamine, and 0.5 for each of the tyrosine conformations. The blue contours surround the regions with high electron density, and the atomic model is shown in sticks. These pictures are created using the Astex viewer, which is available on the Structure Summary page for this PDB entry (just click the "EDS" link in the "Experimental Method" section).
The picture below of the whole myoglobin molecule is shown with all of the amino acids that have two conformations in the file.
You can click on the picture for an interactive Jmol version.
Tip: When dealing with PDB entries with multiple coordinates, you often need to pay close attention. It is not always possible to select just the "A" conformations and throw away the "B" conformations. You need to look carefully in each case and make sure that there are not any bad contacts between mobile sidechains.