PDB Statistics: Growth in Number of Unique Protein Sequences in Released PDB Structures (Cumulative) at Identity 50%

This chart shows the annual and cumulative numbers of protein sequences in released PDB structures. The chart can be viewed for a few different levels of sequence identity since the beginning of the PDB archive. The cumulative bars represent the growth in unique protein sequences (number of polymeric entities) across history. The yearly bars (dark blue) tell how many new protein sequences were added in a certain year.

Note: The total number of sequence clusters in the statistics table is linked to the sequence cluster group search result page. There is a default precision threshold in calculating the numbers for performance balance. So the statistics count may have a slight discrepancy compared to the actual non-redundant group search result when the result count approaches or goes above 10,000. The group search result page provides an accurate count. The statistics page provides the trend.

Chart is currently loading

Sequence cluster level:

YearNumber of New Protein SequencesTotal Number of Protein Sequences
19761313
1977821
1978324
1979226
1980430
1981838
19821755
1983863
19841073
19851083
1986891
198710101
198821122
198933155
199033188
199145233
199251284
1993163447
1994316763
19952481,011
19962981,309
19974201,729
19985212,250
19996402,890
20007343,624
20017854,409
20028115,220
200311846,404
200416148,018
200517649,782
2006196611,748
2007214113,889
2008198715,876
2009196817,844
2010194219,786
2011170921,495
2012184223,337
2013194025,277
2014226827,545
2015185729,402
2016214531,547
2017214833,695
2018215035,845
2019225738,102
2020274440,846
2021221543,061
2022285945,920
2023272548,645
2024287851,523
2025295154,474