Brian Weitzner and Roland Dunbrack offer an OSX Dashboard widget that provides easy downloads of PDB files and quick access to the RCSB PDB Structure Explorer web page for any given PDB ID. All of the software offered by Roland?s laboratory is available at dunbrack.fccc.edu.
Roland L. Dunbrack, Jr., Ph.D., Associate Professor Program in Molecular and Translational Medicine Fox Chase Cancer Center
Q. How do you use the PDB for your research? What do you think its value is?
A: My group works on developing methods and software for protein structure prediction and the statistical study of protein structures required to do that well. The PDB is therefore at the center of all we do. For instance, we have used sets of thousands of structures to develop new statistical analyses of the protein backbone and of protein side chains for a new version of our backbone-dependent rotamer library. These are being used in our side-chain prediction program SCWRL4,1 released in May 2009 and in developmental versions of the Rosetta program from David Baker?s group. To train and test the accuracy of our structure prediction methods, we also require large sets of protein structures from the PDB.
Of course, large-scale study of protein structures, sometimes known as structural bioinformatics, has a lot of purposes, and there are many research groups worldwide who do such work. So a few years ago we developed the PISCES server,2-3 which allows users to produce lists of PDB entries or chains from PDB entries with user-selected criteria such as resolution, R-factors, and maximum sequence identity between any two proteins in the output list. A very useful feature is that the user can input his or her own list of PDB chains, say, all kinases or all PDB entries with available structure factors, and get a culled list from those with the desired resolution and sequence identity cutoffs. We have used this feature ourselves many times.
Q. The Fox Chase Cancer Center houses a variety of scientists and physicians. How does your research group interact with all of them?
A. It?s part of the culture and history of my institution that we are a collaborative and congenial group of faculty. When I first got to Fox Chase, one of my colleagues showed me some functional results on mutants of an enzyme he worked on. These mutants were associated with homocystinuria in people, which can be a dangerous condition. Some patients were responsive to vitamin B6, the enzyme?s cofactor, and others were not. So we made a model of the protein based on a 19% sequence-identical homologue and the most unresponsive mutations were located in the active site, while the others were mostly on the surface. He was astonished that I could tell him this information, in the absence of a crystal structure of his protein. He gave one of our weekly faculty seminars and showed the model, and not long after I had people knocking on my door. We had some really nice collaborations in the next few years, and in 2003 the Center was awarded a grant from the Pew Charitable Trusts in Philadelphia to establish the Fox Chase Molecular Modeling Facility with one full-time staff member. That person?s job is to use many kinds of modeling methods that are publicly available to model proteins and protein complexes to help explain existing experimental data and to generate new hypotheses and experiments. We interact frequently with the investigators during each project so that they can make the most use of the models and the analysis, and we provide text and images for papers and grant proposals.
In the 12 years I have been at Fox Chase, my group has worked with about 60 of my colleagues on over 120 different protein targets. Questions range from where are the functional domains in this 2000 amino acid cancer-associated protein? to where can I make a mutation to knock out one protein-protein interaction of my protein without affecting others? Adrian Cantuescu in the Facility developed a graphical user interface, MolIDE, to make basic homology modeling (searching the PDB, selecting a template, producing and editing the sequence-template alignment, and loop and side-chain modeling) very easy.4 We are now extending it to complexes of proteins.
Q. In addition to your work in protein sequence analysis and structure prediction, you have been focused on the accuracy and representation of biological assemblies. Why do you think this is so important? How did this interest develop?
A. This interest developed largely from working with my colleagues. We had many cases where we needed to build complexes from dimers to octamers, and while we could do that with some manual intervention, it was sometimes a tedious process. We also had some cases where deleterious mutants were obviously in protein-protein interfaces, and we wanted it to be easier to make such models. Initially, I naively thought the available databases, the PDB itself and the Protein Quaternary Server at the EBI would have very similar sets of biological units for PDB entries. It turned out they agreed with each other only about 80% of the time. So we produced a database-software program, ProtBUD, to be able to retrieve and present in a sortable table all the biological unit information from both sources for any query protein family.5 The program also provides ligand information, making it easy to find perhaps the single entry among dozens in a protein family that has the biological ligand of interest, instead of having to search all the entries manually.
We also were working on a model of a sulfotransferase, SULT4A1, for one of my colleagues and she showed us a paper on crosslinking, protease digestion, and mass spectrometry of a dimer of this protein locating residues involved in the homodimer interface. It turned out the interface was present in the two crystal structures then available for SULT family members when the paper was published (2001). In 2005, there were something like 12 different crystal forms of various family members, and we found the same dimer interface in every single one of them.6 Only one of 20 or so PDB entries had the dimer interface annotated in the PDB. The rest were either monomeric or had a bunch of other different interfaces, not shared by any more than one crystal form. PQS had one or two interfaces correct, but only when the dimer was in the asymmetric unit. This led us to do a PDB-wide examination of crystals in protein families, to identify common interfaces.7 Annotations in the PDB at that time were mostly from the authors or PQS, and we found that for families with a large number of crystal forms containing the same interface, about 90% of the entries had that interface in the biological assembly in the PDB. For the newly developed PISA program at the EBI, the number was about 95%. The PDB is now including the PISA annotations in its biological assemblies, which I think is a great idea. It gives users an opportunity to examine various hypothetical assemblies for any entry they may be interested in.
Q. DashPDB is a recent addition to a variety of programs and resources available from your website for download. Recently, you released an updated version of SCWRL, a program for predicting protein side-chain conformations. How do you think your software programs are being used?
A. I strongly believe in making the methods and programs developed by computational biology research groups available to the public either as webservers or downloadable software or depending on the purpose, preferably both. There are many papers on method development with no software available, or only a webserver so that a user can only do one manually input request at a time, which limits the kind of studies that can be done. Sometimes software is available, but has to be compiled by the user or is written such that the program is very finicky about complicated input files in ways that are not well documented. So we try to provide software that is easy to use and does what it says it does with good documentation. It?s easier to do this if we think about this at the beginning of method development, rather than as an afterthought when a student or postdoc has created something that nobody else can use.
Our software gets used in many different ways. SCWRL is easy enough to use that I think groups interested in things other than just structure prediction use it. We have about 3300 licensees. MolIDE, which makes models with SCWRL from a query sequence, has about 1200 users. Since SCWRL is very fast, it gets used on some servers that provide sequence-template alignments for remote homologues (like FFAS) in order to make a quick model based on the alignment. I also hope our programs get used in the kinds of productive and very enjoyable collaborations we have been fortunate enough to have over the last 12 years.
1. G. G. Krivov, M. V. Shapovalov & R. L. Dunbrack, Jr. (2009) Improved prediction of protein
side-chain conformations with SCWRL4. Proteins 77:778-795.
2. G. Wang & R. L. Dunbrack, Jr. (2003) PISCES: a protein sequence culling server. Bioinformatics 19:1589-1591.
3. G. Wang & R. L. Dunbrack, Jr. (2005) PISCES: recent improvements to a PDB sequence culling
server. Nucleic Acids Res 33:W94-98.
4. A. A. Canutescu & R. L. Dunbrack, Jr. (2005) MollDE: a homology modeling framework you can click with. Bioinformatics 21:2914-2916.
5. Q. Xu, A. Canutescu, Z. Obradovic & R. L. Dunbrack, Jr. (2006) ProtBuD: a database of
biological unit structures of protein families and superfamilies. Bioinformatics 22:2876-2882.
6. B. Weitzner, T. Meehan, Q. Xu & R. L. Dunbrack, Jr. (2008) An unusually small dimer
interface is observed in all available crystal structures of cytosolic sulfotransferases. Proteins:
Structure, Function and Genetics 75:289-295.
7. Q. Xu, A. A. Canutescu, G. Wang, M. Shapovalov, Z. Obradovic & R. L. Dunbrack, Jr. (2008) Statistical analysis of interface similarity in crystals of homologous proteins. J Mol Biol 381:487-507.