PDB Statistics: Growth in Number of Unique Protein Sequences in Released PDB Structures (Cumulative) at Identity 100%

This chart shows the annual and cumulative numbers of protein sequences in released PDB structures. The chart can be viewed for a few different levels of sequence identity since the beginning of the PDB archive. The cumulative bars represent the growth in unique protein sequences (number of polymeric entities) across history. The yearly bars (dark blue) tell how many new protein sequences were added in a certain year.

Note: The total number of sequence clusters in the statistics table is linked to the sequence cluster group search result page. There is a default precision threshold in calculating the numbers for performance balance. So the statistics count may have a slight discrepancy compared to the actual non-redundant group search result when the result count approaches or goes above 10,000. The group search result page provides an accurate count. The statistics page provides the trend.

Chart is currently loading

Sequence cluster level:

YearNumber of New Protein SequencesTotal Number of Protein Sequences
19761313
19771225
1978328
1979634
1980539
19811049
19821968
19831381
19841293
198512105
19869114
198711125
198843168
198945213
199081294
1991104398
199288486
1993361847
19946841,531
19954972,028
19965872,615
19977833,398
199811194,517
199912795,796
200014327,228
200114928,720
2002154010,260
2003215512,415
2004287215,287
2005304418,331
2006351321,844
2007403025,874
2008376029,634
2009384233,476
2010402737,503
2011386641,369
2012416945,538
2013462350,161
2014538055,541
2015463060,171
2016534665,517
2017560771,124
2018550076,624
2019603582,659
2020712589,784
2021653296,316
20227732104,048
20237368111,416
20247821119,237
20258816128,053
20262034130,087