PDB Statistics: Growth in Number of Unique Protein Sequences in Released PDB Structures (Cumulative) at Identity 70%

This chart shows the annual and cumulative numbers of protein sequences in released PDB structures. The chart can be viewed for a few different levels of sequence identity since the beginning of the PDB archive. The cumulative bars represent the growth in unique protein sequences (number of polymeric entities) across history. The yearly bars (dark blue) tell how many new protein sequences were added in a certain year.

Note: The total number of sequence clusters in the statistics table is linked to the sequence cluster group search result page. There is a default precision threshold in calculating the numbers for performance balance. So the statistics count may have a slight discrepancy compared to the actual non-redundant group search result when the result count approaches or goes above 10,000. The group search result page provides an accurate count. The statistics page provides the trend.

Chart is currently loading

Sequence cluster level:

YearNumber of New Protein SequencesTotal Number of Protein Sequences
19761313
1977821
1978324
1979226
1980430
1981838
19821856
1983965
19841075
19851287
1986996
198711107
198822129
198940169
199039208
199151259
199257316
1993179495
1994348843
19952771,120
19963271,447
19974581,905
19985952,500
19997083,208
20008174,025
20018844,909
20029265,835
200313107,145
200418128,957
2005200310,960
2006222013,180
2007245115,631
2008228017,911
2009230620,217
2010230122,518
2011205724,575
2012220026,775
2013233329,108
2014282631,934
2015228734,221
2016257236,793
2017266839,461
2018262142,082
2019279944,881
2020345948,340
2021289651,236
2022347554,711
2023339358,104
2024351361,617
2025363465,251