Proteins can have various degrees of similarity. If two proteins show high similarity in their
amino acid sequence, it is generally assumed that they are closely evolutionary related.
With increasing evolutionary distance the degree of similarity usually drops. Even if the
sequence similarity is low, proteins can still show similar function and have an overall
similar 3D structure. The detection of such remote similarities is important in order to
infer functional and evolutionary relationships between protein families and is a core
technique used in structural bioinformatics. The goal is to establish regions of structural
similarity between two or more molecules.
While protein sequence comparisons can be computed quickly, the
calculation of protein
structure alignments is much more time
consuming. The RCSB PDB
offers tools, that allow users to quickly identify protein sequence
neighbors and run pairwise protein structure comparisons. To help
identify more distant 3D relationships, a pre-calculated set of
3D protein structure alignments is available through the 3D similarity tab.
Screenshot of a pairwise protein structure alignment. It has been calculated using the jCE algorithm, available
through the Protein Comparison Tool at the RCSB web site.
Representative protein domains are being used since calculation of a
real all vs. all comparison would require too much CPU time.
The procedure to come up with the domain-split representative is an extension of our protein-chain
sequence clustering approach.
In order to remove redundancy, we start with a
40% sequence identity clustering procedure.
All sequences in a cluster are sorted and are being represented by the protein chain on rank #1. This is usually
the chain with the highest resolution and has been
determined by X-ray
In case the representative chain consists out of multiple domains,
each of those domains are included in database searches. If available, the domain assignment as provided
by SCOP 1.75 is used.
Otherwise algorithmic domain assignments are computed, using the
Example: Try the Cyclodextrin glycosyl transferase 3BMV
are grouped together in a cluster of chains with 40% sequence similarity and then ranked,
are being represented by the protein chain on rank #1. This is usually
the chain with the highest resolution and has been
determined by X-ray
If a PDB chain is accessed, that is not the
representative, the results for the representative chain are loaded
At the present systematic comparisons contain about 1 billion
pairwise alignments. These bulk of these have been calculated on the
Open Science Grid. A technical report describing the details of how this
calculations were run is available from
At the present weekly updates for new structures are calculated using RCSB servers.
The 3D similarity tab shows the results of the systematic
comparisons of the representative domains. The results can be sorted and
filtered based on various scores.
The screenshot above shows the summary results that are available for a database search. In order to obtain a
detailed view of the results, click on the PDB ID in a row.
Each column in this table can be sorted. The results can be filtered based on various criteria.
The table is sorting is by P-value by default.
Clicking on the column
header will change the sort order.
Select the Filter Results
icon to apply
other filtering criteria.
The pairwise view of a structure alignment can be used to investigate protein sequence and
structure relationships between the sequence-representation of the alignment and the 3D display in Jmol.
Regions can be selected in the bottom sequence display to see where they are in the 3D Jmol display.
The source code of the RCSB Protein Comparison Tool is available for download and local installation from
The source code is available as open source under the LGPL license via the BioJava
project and hosted on github.
A tab-separated file containing the results for all structural representatives is available for download via FTP.
Note: The file has a compressed size of several hundred MB.
The all vs. all structural similarity results table for a representative chain can be downloaded in XML.
For example, this returns Rank, PDB.Chain, Description, P-value, Score, RMSD, Len1, Len2, %Sim1, and %Sim2 for
Note: A maximum 2000 rows can be returned through this URL. To fetch all approx. 17,000 results for a chain,you need
to slice through the results using the page parameter.
A number of algorithms are provided for structural comparison. The precalculated results are based on FATCAT-rigid.
The downloadable Protein Comparison Tool can use CE, CE-CP, FATCAT-rigid, and FATCAT-flexible for structural
comparisons, as well as the Smith-Waterman algorithm for sequence alignment. The website offers additional services
for pairwise alignments, including TM-align, TopMatch, and Dali through external servers.
The all vs. all comparisons are based on jFATCAT, a Java port of the original FATCAT algorithm
Yuzhen Ye & Adam Godzik (2003)
Flexible structure alignment by chaining aligned fragment
pairs allowing twists.
Bioinformatics vol.19 suppl. 2. ii246-ii255.
Two flavors of jFATCAT are available. FATCAT-rigid uses a rigid-body superposition to align the two
structures. FATCAT-flexible introduces 'twists' between different parts of the proteins which are
superimposed independently. This is ideal for proteins which undergo large conformational shifts, where a global
superposition cannot capture the underlying similarity between domains. For instance, the structures of calmodulin
with and without calcium bound can be much better aligned with FATCAT-flexible than with one of the rigid alignment
algorithms. The downside of this is that it can lead to additional false positives in unrelated structures.
The RCSB PDB Protein Comparison Tool
also provides jCE, a Java port of the CE algorithm. see:
I.N. Shindyalov, P.E. Bourne (1998)
structure alignment by incremental combinatorial extension (CE)
Protein Eng 11: 739-747
CE performs a rigid-body superposition of the proteins, similar to FATCAT-rigid.
The tool also provides CE with Circular Permutations (CE-CP).
CE and FATCAT both assume that aligned residues occur in the same order in both
proteins (e.g. they are both sequence-order dependent algorithms). In proteins
related by a circular permutation, the N-terminal part of one protein is related
to the C-terminal part of the other, and vice versa. CE-CP allows circularly
permuted proteins to be compared. For more information on circular
permutations, see the
Example: Circular permutation of concanavalin A and peanut lectin:
We also provide the jCE and jFATCAT source code that is used for
the RCSB PDB Protein Comparison Tool for local
execution free of charge. The source code is available from http://source.rcsb.org
Want more control in using structure alignment algorithms? Would you like to better understand how the algorithms
work by trying different parameter sets? The new jCE/jFATCAT user interface supports manipulation of low-level
This option is for experts that have a basic understanding of the alignment algorithms.
As an example is the maximum gap size parameter G during the extension of Aligned Fragment Pairs of the CE
The parameter is by default set to 30, a trade-off for performance vs. result accuracy. For the protein pair 1CDG.A
the default parameters can't identify the whole TIM barrel that is in common between the two chains. Removing the
on the parameter G (by setting it to 0) increases the calculation time, but gives an alignment that is 25
Tip: To change parameters, launch the
Align custom files menu and click the Parameters
The Structure Alignment Tool also provides functionality for PDB-wide structural searches. This systematically
compares a query structure against all representative structures in the PDB. Comparisons can be made to either
representative chains or domains, as described above.
To do a PDB-wide structure alignment, use the 'Database Search' panel of the Structure Alignment Tool. The selected
output directory will be used to store results. These consist of individual alignments in compressed XML format, as
well as a tab-delimited file of similarity scores and statistics. The statistics are displayed in an interactive
results table, which allows the alignments to be sorted. The 'Align' column allows individual alignments to be
visualized with the alignment GUI.
RCSB PDB Comparison Tool Reference
Andreas Prlic; Spencer Bliven; Peter W. Rose; Wolfgang F. Bluhm; Chris Bizon; Adam Godzik; Philip E. Bourne
Pre-calculated protein structure alignments at the RCSB PDB website
Bioinformatics 26: 2983-2985
The RCSB PDB (citation) is managed by two members of the Research Collaboratory for Structural Bioinformatics:
RCSB PDB is a member of the
The RCSB PDB is funded by a grant (DBI-1338415) from the
National Science Foundation, the
National Institutes of Health, and the
US Department of Energy.