PDB Statistics: Growth in Number of Unique Protein Sequences in Released PDB Structures (Cumulative) at Identity 90%

This chart shows the annual and cumulative numbers of protein sequences in released PDB structures. The chart can be viewed for a few different levels of sequence identity since the beginning of the PDB archive. The cumulative bars represent the growth in unique protein sequences (number of polymeric entities) across history. The yearly bars (dark blue) tell how many new protein sequences were added in a certain year.

Note: The total number of sequence clusters in the statistics table is linked to the sequence cluster group search result page. There is a default precision threshold in calculating the numbers for performance balance. So the statistics count may have a slight discrepancy compared to the actual non-redundant group search result when the result count approaches or goes above 10,000. The group search result page provides an accurate count. The statistics page provides the trend.

Chart is currently loading

Sequence cluster level:

YearNumber of New Protein SequencesTotal Number of Protein Sequences
19761313
19771023
1978326
1979632
1980436
1981844
19821862
19831173
19841184
19851296
19869105
198711116
198825141
198945186
199051237
199154291
199267358
1993221579
19944361,015
19953281,343
19963911,734
19975412,275
19987162,991
19998433,834
20009514,785
200110065,791
200210616,852
200314878,339
2004204710,386
2005227012,656
2006252615,182
2007286618,048
2008264820,696
2009269823,394
2010273926,133
2011250428,637
2012270931,346
2013293034,276
2014357637,852
2015293740,789
2016342944,218
2017361747,835
2018330551,140
2019368854,828
2020459159,419
2021402363,442
2022489368,335
2023457872,913
2024488577,798
2025553283,330
2026115084,480