PDB Statistics: Growth in Number of Unique Protein Sequences in Released PDB Structures (Cumulative) at Identity 95%

This chart shows the annual and cumulative numbers of protein sequences in released PDB structures. The chart can be viewed for a few different levels of sequence identity since the beginning of the PDB archive. The cumulative bars represent the growth in unique protein sequences (number of polymeric entities) across history. The yearly bars (dark blue) tell how many new protein sequences were added in a certain year.

Note: The total number of sequence clusters in the statistics table is linked to the sequence cluster group search result page. There is a default precision threshold in calculating the numbers for performance balance. So the statistics count may have a slight discrepancy compared to the actual non-redundant group search result when the result count approaches or goes above 10,000. The group search result page provides an accurate count. The statistics page provides the trend.

Chart is currently loading

Sequence cluster level:

YearNumber of New Protein SequencesTotal Number of Protein Sequences
19761313
19771023
1978326
1979632
1980436
1981945
19821863
19831174
19841185
19851297
19869106
198711117
198825142
198945187
199051238
199156294
199265359
1993226585
19944531,038
19953441,382
19964001,782
19975492,331
19987483,079
19998903,969
20009874,956
200110456,001
200210977,098
200315468,644
2004211710,761
2005233213,093
2006261515,708
2007296218,670
2008274621,416
2009280224,218
2010284527,063
2011261229,675
2012287532,550
2013307135,621
2014377839,399
2015312142,520
2016369446,214
2017392550,139
2018372953,868
2019407957,947
2020498462,931
2021446767,398
2022550472,902
2023521478,116
2024548183,597
2025581889,415