PDB Community Focus: Helen M. Berman

Helen M. Berman

Helen M. Berman, Director of the RCSB Protein Data Bank, completed an AB degree in chemistry at Barnard College in 1964, and in 1967 received her PhD in natural science from the University of Pittsburgh where she studied with George Jeffrey in the Department of Crystallography. In 1969, she went to the Institute for Cancer Research (ICR), Fox Chase Cancer Center in Philadelphia to work with Jenny Glusker. She became an Assistant Member of ICR in 1973 and then rose through the ranks to Senior Member. She was also an Adjunct Professor at the University of Pennsylvania and Director of Research Computing at Fox Chase. In 1989, she moved to Rutgers University where she currently serves as a Board of Governors Professor of Chemistry and Chemical Biology.

Dr. Berman's crystallographic studies have focused on nucleic acids, protein-nucleic acid complexes, and collagen. She has also done systematic analyses of the hydration patterns of biological molecules, including nucleic acids and collagen. Since the earliest days of her career, she has been interested in establishing methods to collect and archive structural data so that systematic studies of the data could be facilitated. She was part of the original team that developed the PDB at Brookhaven National Laboratory in 1971, and in 1991 she founded the Nucleic Acid Database (NDB; http://ndbserver.rutgers.edu/). In 1998, she led the team of Research Collaboratory for Structural Bioinformatics (RCSB) members that won the contract to manage the PDB.

Throughout her career, Dr. Berman has been an active participant in the scientific community. She has served on numerous advisory boards for the National Science Foundation, the National Institutes of Health, and on journal editorial boards. She has served as President of the American Crystallographic Association (ACA) and has also held leadership positions in the Biophysical Society and the International Union of Crystallography (IUCr). She received the 2000 Biophysical Society Award for Distinguished Service and is a Fellow of the Biophysical Society and of the American Association for the Advancement of Science.

Under Dr. Berman's leadership, the RCSB began its second five-year period of PDB management in January 2004. During the first five years, the number of released structures in the archive had more than doubled, and the pace of depositions continues to increase at a steady rate.

The RCSB PDB staff solicited Dr. Berman's views on RCSB PDB's accomplishments to date, and her vision for the future.

Q: The PDB was born at a Cold Spring Harbor Symposium in 1971. What was that meeting like?

A: It was an enormously exciting meeting, especially for a young crystallographer. Virtually all the pioneers of the field were there, presenting the results of their research. A particularly vivid image I have is of a large group of people sitting on the grass in a circle around Max Perutz talking about hemoglobin.

Q: What has shaped the PDB the most the since its beginnings as an archive containing seven structures?

A: There has been a progression of influences on the PDB. First, the focus was on getting structures into the PDB. In the 1970s, Tom Koetzle single handedly wrote personal letters to every protein crystallographer asking them to participate. In the 1980s, the community of protein crystallographers began to organize under the leadership of Fred Richards to try to encourage people to deposit structures. The IUCr set up more formal committees to achieve the same thing. In 1989, guidelines were set forth requiring deposition and release of macromolecular structures. At the same time, the technology improved making structure determination much faster. By the early 1990s, the number of depositions began to rise quickly and the problem of how to keep up with the data emerged. A backlog of structures began to build. The PDB was a victim of its own success. The use of modern data management methods as well as the development of an efficient team of annotators has helped to solve this problem. The challenges in the 2000s will be high throughput structure analysis, large macromolecular assemblies, and the demand for archiving more information about each experiment and its results.

Q: You have been actively involved with the PDB since its beginning. What continues to draw you to this project?

A: The idea that by looking at groups of structures it will be possible to derive new knowledge has always been a compelling concept. I have done this in my own research and have always wanted to make it possible for others. If all the data are organized properly, it should be possible to mine it efficiently. If many people can do this on large data sets, it should be possible to learn about basic concepts, such as protein folding, and use the knowledge to create new drugs.

Q: Some people think that being involved with scientific infrastructure, such as the PDB, is not doing real science and is therefore less important. Do you agree?

A: Not at all. It is one of the most important things that I can do. To do it right requires knowledge of the data and the technology needed to collect and disseminate it. Once the infrastructure is in place, new science will emerge. To facilitate that process and see what emerges, and to imagine what new science will be facilitated is what motivates me.

Q: The RCSB is comprised of organizations in different parts of the country. How does the collaboration work?

A: That is complicated. Each site has its own set of projects that contribute to the PDB. However, each project must also interact with all the others. To make this work we have developed various computer-based forums. Daily communication by email and phone is critical, as are personal visits and personnel exchanges among the groups. Recently, we have begun to use video conferencing. Once a year we have a retreat that allows everyone to be together for a couple of days to talk about the various projects, to plan for the coming year, and for PDB staffers to get to know one another.

Q: Who makes up the PDB's user community?

A: It used be only crystallographers, and later NMR spectroscopists, but now it has expanded to include biologists, computational biologists, educators, and students.

Q: How does the PDB interact with its users?

A: We have various electronic mail services that allow users to ask questions and bring problems to our attention. We attend many different meetings and participate in a variety of ways. At some meetings, we have an exhibit booth. We are also organizing workshops for the purpose of educating different parts of the community about what we do and how to use the various tools. Outreach is a key element of the PDB because it gives us the feedback we need to improve what we do.

Q: Recently, the RCSB formalized the ongoing collaboration with the Macromolecular Structure Database-European Bioinformatics Institute and PDBj (PDB Japan) to form the wwPDB. How does this affect the PDB?

A: The wwPDB was organized to make sure that the PDB remains as a single archive. When users from around the world access a flat file with ID 1XXX they can be assured that they will always get the same file. This organization will also help us develop new collaborations that will enhance the use of the PDB files. Science is international and wwPDB acknowledges that.

Q: Currently, 28 of your structure determinations are in the PDB. What has been your experience as a PDB depositor?

A: Watching my students deposit files in the PDB allows me to see how we can improve the process.

Q: You are active in other areas of research, including protein-nucleic acid interactions, structure determinations, and databases. Has this activity influenced your work with the PDB?

A: As a depositor and a user, I have both perspectives. I have used structural data to do systematic analyses of macromolecules. The need to have easy access to the data was a motivating force for me in helping to improve the PDB. As a user of the PDB, I can see how we can make it easier for others to use.

Q: What do you think the PDB will be like in the next 30 years?

A: In 1971, it was almost beyond our imagination that structure determination of proteins could be completed in a few days with the results instantly accessible on a desktop computer. But here we are, and we now know for sure that in the future, there will be even more structures, new methods for structure determination, much larger structures and we will have more information about each structure. All aspects of the process will be fully automated. The really exciting thing to think about is what people will do with all the data. This will depend on the ingenuity of new generations of biologists, some of whom are not yet born, who will certainly find ways to use all this information and give us the ultimate knowledge about how molecules function.