PDB Statistics: Growth in Number of Unique Protein Sequences in Released PDB Structures (Cumulative) at Identity 90%

This chart shows the annual and cumulative numbers of protein sequences in released PDB structures. The chart can be viewed for a few different levels of sequence identity since the beginning of the PDB archive. The cumulative bars represent the growth in unique protein sequences (number of polymeric entities) across history. The yearly bars (dark blue) tell how many new protein sequences were added in a certain year.

Note: The total number of sequence clusters in the statistics table is linked to the sequence cluster group search result page. There is a default precision threshold in calculating the numbers for performance balance. So the statistics count may have a slight discrepancy compared to the actual non-redundant group search result when the result count approaches or goes above 10,000. The group search result page provides an accurate count. The statistics page provides the trend.

Chart is currently loading

Sequence cluster level:

YearNumber of New Protein SequencesTotal Number of Protein Sequences
19761313
19771023
1978326
1979632
1980436
1981844
19821862
19831173
19841184
19851296
19869105
198711116
198825141
198945186
199048234
199154288
199266354
1993217571
1994427998
19953221,320
19963851,705
19975312,236
19987042,940
19998383,778
20009374,715
20019995,714
200210596,773
200314808,253
2004204510,298
2005224912,547
2006252015,067
2007286617,933
2008263720,570
2009269223,262
2010272025,982
2011248928,471
2012269531,166
2013290834,074
2014357037,644
2015292640,570
2016339843,968
2017360247,570
2018348951,059
2019370454,763
2020446859,231
2021402963,260
2022492468,184
2023467772,861
2024493977,800
2025512782,927