PDB Statistics: Growth in Number of Unique Protein Sequences in Released PDB Structures (Cumulative) at Identity 70%

This chart shows the annual and cumulative numbers of protein sequences in released PDB structures. The chart can be viewed for a few different levels of sequence identity since the beginning of the PDB archive. The cumulative bars represent the growth in unique protein sequences (number of polymeric entities) across history. The yearly bars (dark blue) tell how many new protein sequences were added in a certain year.

Note: The total number of sequence clusters in the statistics table is linked to the sequence cluster group search result page. There is a default precision threshold in calculating the numbers for performance balance. So the statistics count may have a slight discrepancy compared to the actual non-redundant group search result when the result count approaches or goes above 10,000. The group search result page provides an accurate count. The statistics page provides the trend.

Chart is currently loading

Sequence cluster level:

YearNumber of New Protein SequencesTotal Number of Protein Sequences
19761313
1977821
1978324
1979226
1980430
1981838
19821755
1983964
19841074
19851286
1986894
198711105
198823128
198940168
199041209
199149258
199259317
1993179496
1994347843
19952751,118
19963271,445
19974571,902
19985932,495
19997103,205
20008234,028
20018754,903
20029315,834
200313157,149
200418178,966
2005199710,963
2006223113,194
2007245815,652
2008228817,940
2009230120,241
2010229722,538
2011205124,589
2012220626,795
2013233829,133
2014282731,960
2015228834,248
2016257236,820
2017267339,493
2018260642,099
2019279544,894
2020345048,344
2021282551,169
2022357954,748
2023339858,146
2024347061,616
2025397865,594
202678466,378