PDB Statistics: Growth in Number of Unique Protein Sequences in Released PDB Structures (Cumulative) at Identity 100%

This chart shows the annual and cumulative numbers of protein sequences in released PDB structures. The chart can be viewed for a few different levels of sequence identity since the beginning of the PDB archive. The cumulative bars represent the growth in unique protein sequences (number of polymeric entities) across history. The yearly bars (dark blue) tell how many new protein sequences were added in a certain year.

Note: The total number of sequence clusters in the statistics table is linked to the sequence cluster group search result page. There is a default precision threshold in calculating the numbers for performance balance. So the statistics count may have a slight discrepancy compared to the actual non-redundant group search result when the result count approaches or goes above 10,000. The group search result page provides an accurate count. The statistics page provides the trend.

Chart is currently loading

Sequence cluster level:

YearNumber of New Protein SequencesTotal Number of Protein Sequences
19761313
19771225
1978328
1979634
1980539
19811049
19821968
19831381
19841293
198512105
19869114
198711125
198843168
198944212
199081293
1991103396
199288484
1993357841
19946781,519
19954972,016
19965822,598
19977783,376
199811114,487
199912625,749
200014147,163
200114918,654
2002152710,181
2003214512,326
2004285815,184
2005302118,205
2006349121,696
2007402325,719
2008372529,444
2009381233,256
2010400437,260
2011385641,116
2012415145,267
2013456049,827
2014536555,192
2015465959,851
2016543065,281
2017564970,930
2018543176,361
2019597282,333
2020690589,238
2021651995,757
20227711103,468
20237285110,753
20247682118,435
20258057126,492