PDB Statistics: Growth in Number of Unique Protein Sequences in Released PDB Structures (Cumulative) at Identity 30%

This chart shows the annual and cumulative numbers of protein sequences in released PDB structures. The chart can be viewed for a few different levels of sequence identity since the beginning of the PDB archive. The cumulative bars represent the growth in unique protein sequences (number of polymeric entities) across history. The yearly bars (dark blue) tell how many new protein sequences were added in a certain year.

Note: The total number of sequence clusters in the statistics table is linked to the sequence cluster group search result page. There is a default precision threshold in calculating the numbers for performance balance. So the statistics count may have a slight discrepancy compared to the actual non-redundant group search result when the result count approaches or goes above 10,000. The group search result page provides an accurate count. The statistics page provides the trend.

Chart is currently loading

Sequence cluster level:

YearNumber of New Protein SequencesTotal Number of Protein Sequences
19761212
1977921
1978324
1979226
1980329
1981736
19821450
1983757
19841168
19851078
1986886
1987894
198818112
198928140
199033173
199141214
199252266
1993144410
1994297707
1995223930
19962651,195
19973801,575
19984612,036
19995592,595
20006373,232
20016793,911
20026914,602
20039705,572
200413786,950
200514568,406
2006159810,004
2007169311,697
2008155713,254
2009148414,738
2010142316,161
2011125217,413
2012135118,764
2013144220,206
2014167321,879
2015139023,269
2016159124,860
2017163926,499
2018162928,128
2019166429,792
2020206331,855
2021163333,488
2022204335,531
2023200137,532
2024202739,559
2025223941,798
202646042,258