In order to identify all transmembrane proteins in the PDB, we are loading
the manually annotated transmembrane dataset from
mpstruc (UC Irvine).
Mpstruc provides useful information about integral
membrane proteins whose crystallographic, or sometimes NMR, structures have
been determined to a resolution sufficient to identify TM helices of
helix-bundle membrane proteins (typically 4 - 4.5 Å).
The latest mpstruc data is downloaded from
on a weekly basis.
These manual annotations are extended using
our sequence clusters
and according to the following procedure:
Mpstruc is annotating transmembrane proteins on a per-PDB entry level. If the
reference mpstruc entry contains only a single protein entity, this protein must be a
tranmembrane protein. Therefore any PDB chain sharing 90% sequence identity to this transmembrane
protein is assigned as a transmembrane protein as well, and shares the same
If the reference mpstruc entry contains multiple protein entities, it is necessary to identify
which of the entities are presumed to be transmembrane chains.
This is done in conjunction with Uniprot annotations. Transmembrane protein entities are identified by
checking if their corresponding Uniprot sequence has annotations labeled transmembrane
or intramembrane region. For transmembrane entities, all members of the sequence
cluster (90% sequence identity) are programmatically infered to be members of the same class of
transmembrane proteins by applying the above procedure for single entity mpstruc entries.
The RCSB PDB (citation) is managed by two members of the Research Collaboratory for Structural Bioinformatics:
RCSB PDB is a member of the
The RCSB PDB is funded by a grant from the
National Science Foundation, the
National Institutes of Health, and the
US Department of Energy.