Sequence Clustering Update

News

Sequence Clustering Update

05/03

Sequence clustering based on polymer entity IDs has been relaxed from 90% to 80%

Sequence cluster groups enable exploration of sets of homologous sequences and can reveal trends across hundreds of related proteins.

RCSB.org offers data files that contain the results of the weekly clustering of protein sequences in the PDB by MMseqs2 at 30%, 40%, 50%, 70%, 90%, 95%, and 100% sequence identity. Note that these files use polymer entity identifiers, instead of chain identifiers to avoid redundancy. The files are plain text with one cluster per line, sorted from largest to smallest cluster.

The Advanced Search Group option also simplifies PDB searching by generating a non-redundant search result set based on sequence identity clustering (as well as UniProt ID, and group depositions).

The clustering requires a meaningful overlap between sequences (in addition to their sequence identity). This coverage requirement has been relaxed from 90% to 80%, which is the coverage threshold used by UniRef. This change addresses some unintuitive clusterings, where highly similar sequences were assigned to different clusters.

Consequently, the sequence clusters offered are slightly larger on average and fewer in number. Some group identifiers have changed. Redundancy-filtered result sets (see example), which collapse similar polymer entities into groups, can now be navigated more efficiently as there are fewer groups to explore.

User guides are available for Grouping Structures and Sequence-based Clustering.

News Index

12/26 Peak Performance for 2024

12/21 Season's Greetings

12/19 Celebrate #NationalCrosswordDay with Sequence Events

12/18 wwPDB News: Improved Depositor Experience Using ORCiD

12/18 Molecular Origami: Green and Red Fluorescent Proteins

12/12 wwPDB News: PDB Entries with Novel Ligands Now Distributed Only in PDBx/mmCIF and PDBML File Formats

12/08 Education Corner: MedChemBlog

12/04 Explore the Structural Biology of Health and Nutrition

11/30 December 1: World AIDS Day

11/30 Access CSMs of Available Model Organisms

11/28 Watch the Crash Course: RCSB PDB APIs

11/24 Papers Published in Special Issue of Journal of Molecular Biology

11/22 wwPDB News: Deprecation of FTP File Download Protocol in the PDB Archive

11/18 Nov 18-24: World Antimicrobial Awareness Week

11/16 Nov 17: WHO's Cervical Cancer Elimination Day of Action

11/15 Meet RCSB PDB at ABRCMS

11/14 wwPDB News: Backbone Annotation and Standardization of Peptide Residues is Now Live

11/13 November 14 is World Diabetes Day

11/07 Molecular Landscapes

11/02 Explore Antibiotic Resistance in 3D

10/31 Explore the Structural Biology of Viruses

10/24 Access Computed Structure Model Annotations

10/23 Meet RCSB PDB at SACNAS

10/20 Happy Birthday, PDB!

10/18 wwPDB News: Poster Prize Awarded at The Protein Society

10/15 Happy Birthday, Irving Geis

10/11 wwPDB News: Poster Prize Awarded at IUCr

10/10 Download PAE JSON Files for AlphaFold Models

10/08 Fall Newsletter Published

10/05 The Nobel Prize in Physiology or Medicine 2023

10/01 Structural Biology and Nobel Prizes

09/26 ASBMB Members: Register Now for Virtual Event

09/25 Undergrads/Grads: January 2024 Science Communication Boot Camp (Cancelled)

09/19 Poster Prize Awarded at ACA 2023

09/15 September 19: DNS name changes for PDB archive downloads from RCSB PDB

09/14 Register Now for October Virtual Crash Courses on RCSB PDB APIs

09/12 Poster Prize Awarded at ISMB 2023

09/12 wwPDB News: Coming Soon: PDB Entries with Novel Ligands Distributed Only in PDBx/mmCIF and PDBML File Formats

09/11 Take the Tabular Reports Survey and Win

09/05 Head Back to School with PDB-101

08/29 Turning Global Data into Global Knowledge

08/29 Education Corner: Empowering Educators

08/23 Create Videos About Structural Science

08/16 Meet the RCSB PDB and wwPDB at IUCr

08/08 Bragg Your Pattern at IUCr

08/01 New Poster Available for Download

07/25 wwPDB News: Updated Annotation and Standardization of Peptide Residues

07/18 Summer Newsletter Published

07/09 wwPDB News: Celebrating 20 Years of the wwPDB

07/04 wwPDB News: PDB NextGen Archive Now Provides Intra-molecular Connectivity

07/04 Meet the RCSB PDB at ACA

07/03 DNS name changes for PDB archive downloads from RCSB PDB starting September 2023

06/27 Introducing the Nucleic Acid Knowledgebase

06/27 Explore PDB Data Distributions

06/19 Explore the Structural Biology of Bioenergy

06/19 Toggle to "Opt-in" to Access Computed Structure Models Alongside PDB Data

06/13 Guide to Understanding PDB Data: Intro to APIs

06/13 Watch the Crash Course: Understanding PDBx/mmCIF

06/06 Education Corner: Bragg Your Pattern at IUCr

06/06 Search for Structures or Feature Help, News, and PDB-101 articles

05/30 Easily Build Advanced Searches

05/29 Molecular Landscapes

05/23 Prepare Depositions Using New pdb_extract Features

05/16 wwPDB News: ls-lR index file to be removed July 12, 2023

05/16 Award-Winning Videos on Molecular Mechanisms of Targeted Cancer Therapies

05/09 Perform Improved Pairwise Structure Alignments

05/09 Education Corner: ACA Literacy Portal

05/03 Sequence Clustering Update

05/03 Beginner's Guide to PDB Structures and PDBx/mmCIF

05/02 Helen Berman Elected to National Academy of Sciences

05/01 Vote Now for the Viewer's Choice Award

04/30 Register Now for May 3 Virtual Crash Course: Understanding PDBx/mmCIF

04/26 Annual Report Published

04/25 Search for Structures By Date

04/24 wwPDB News: Poster Prize Awarded at #DiscoverBMB

04/21 Celebrate DNA Day on April 25

04/18 Spring Newsletter Published

04/18 Find Structurally Similar Chains and Assemblies

04/11 Upload Structure Files to Search the PDB

04/03 Using KBase to access PDB Structures and CSMs

04/02 wwPDB News: Removal of ls-lR index file from the PDB archive

04/02 Register Now for Virtual Crash Course: Understanding PDBx/mmCIF

03/28 High School Students: Submit Videos By April 24

03/26 wwPDB News: Access Depositions Using ORCiD

03/23 Meet RCSB PDB at the #DiscoverBMB Meeting

03/20 Search for Structural Motifs

03/20 Register Now for Crash Course: Python Scripting for Biochemistry & Molecular Biology (Part 2)

03/18 Keeping Up with RCSB PDB News

03/12 Brain Awareness Week

03/09 wwPDB News: Tribute to Dr. Olga Kennard

03/07 New Coloring Page: Myoglobin in a muscle cell

03/07 wwPDB News: PDB entries with extended CCD or PDB IDs will be distributed in PDBx/mmCIF format only

03/06 SDSC and SingAREN Commit to Improving Data Access

02/28 Paper Published: EM Holdings of the PDB

02/20 Molecular Landscapes

02/16 Meet RCSB PDB at The Biophysical Society Meeting

02/14 Congratulations to Shuchismita Dutta

02/14 wwPDB News: Small Angle Scattering News

02/09 Molecular Valentines

02/07 Education Corner: Fundamentals of Biochemistry

02/06 wwPDB News: Prototype of PDB NextGen Archive now available

02/02 February 4 is World Cancer Day

02/01 Explore Even More Computed Structure Models Alongside PDB Data

01/31 wwPDB News: Enhanced Collection of Starting Models

01/30 wwPDB News: Structure Predictors: Use ModelCIF for Computed Structure Models

01/24 Paper Published: Delivering PDB Structures and CSMs at RCSB.org

01/19 Winter Newsletter Published

01/12 Undergrads: Spend your summer with RCSB PDB

01/10 wwPDB News: PDB Reaches a New Milestone: 200,000+ Entries

01/03 wwPDB News: Time-stamped Copies of PDB and EMDB Archives

RCSB PDB Core Operations are funded by the U.S. National Science Foundation (DBI-2321666), the US Department of Energy (DE-SC0019749), and the National Cancer Institute, National Institute of Allergy and Infectious Diseases, and National Institute of General Medical Sciences of the National Institutes of Health under grant R01GM157729. RCSB PDB uses resources of the National Energy Research Scientific Computing Center (NERSC), a Department of Energy User Facility.