7KUW | pdb_00007kuw

High-throughput design and refinement of stable proteins using sequence-only models

PDB DOI: https://doi.org/10.2210/pdb7KUW/pdb

Classification: DE NOVO PROTEIN
Organism(s): synthetic construct
Expression System: Escherichia coli
Mutation(s): No

Deposited: 2020-11-25 Released: 2021-12-15
Deposition Author(s): Bera, A.K., Stewart, L., Kang, A.S., Baker, D.
Funding Organization(s): Howard Hughes Medical Institute (HHMI)

Experimental Data Snapshot

Method: X-RAY DIFFRACTION
Resolution: 2.43 Å
R-Value Free:
0.281 (Depositor), 0.290 (DCC)
R-Value Work:
0.249 (Depositor), 0.250 (DCC)
R-Value Observed:
0.252 (Depositor)

Starting Model: in silico
View more details

wwPDB Validation 3D Report Full Report

This is version 1.2 of the entry. See complete history.

Literature

Large-scale design and refinement of stable proteins using sequence-only models.

Singer, J.M., Novotney, S., Strickland, D., Haddox, H.K., Leiby, N., Rocklin, G.J., Chow, C.M., Roy, A., Bera, A.K., Motta, F.C., Cao, L., Strauch, E.M., Chidyausiku, T.M., Ford, A., Ho, E., Zaitzeff, A., Mackenzie, C.O., Eramian, H., DiMaio, F., Grigoryan, G., Vaughn, M., Stewart, L.J., Baker, D., Klavins, E.

(2022) PLoS One 17: e0265020-e0265020

PubMed: 35286324 Search on PubMedSearch on PubMed Central
DOI: https://doi.org/10.1371/journal.pone.0265020
Primary Citation of Related Structures:
7KUW

PubMed Abstract:
Engineered proteins generally must possess a stable structure in order to achieve their designed function. Stable designs, however, are astronomically rare within the space of all possible amino acid sequences. As a consequence, many designs must be tested computationally and experimentally in order to find stable ones, which is expensive in terms of time and resources. Here we use a high-throughput, low-fidelity assay to experimentally evaluate the stability of approximately 200,000 novel proteins. These include a wide range of sequence perturbations, providing a baseline for future work in the field. We build a neural network model that predicts protein stability given only sequences of amino acids, and compare its performance to the assayed values. We also report another network model that is able to generate the amino acid sequences of novel stable proteins given requested secondary sequences. Finally, we show that the predictive model-despite weaknesses including a noisy data set-can be used to substantially increase the stability of both expert-designed and model-generated proteins.

Organizational Affiliation

Two Six Technologies, Arlington, Virginia, United States of America.

Macromolecule Content

Total Structure Weight: 7.19 kDa
Atom Count: 507
Modeled Residue Count: 62
Deposited Residue Count: 62
Unique protein chains: 1

Macromolecules

Find similar proteins by:

(by identity cutoff) | 3D Structure

Entity ID: 1
Molecule	Chains	Sequence Length	Organism	Details	Image
Sequence-Based Designed Protein nmt_0994_guided_02	A	62	synthetic construct	Mutation(s): 0
Entity Groups
Sequence Clusters	30% Identity50% Identity70% Identity90% Identity95% Identity100% Identity
Sequence Annotations Expand
Reference Sequence

Experimental Data & Validation

Experimental Data

Method: X-RAY DIFFRACTION
Resolution: 2.43 Å
R-Value Free: 0.281 (Depositor), 0.290 (DCC)
R-Value Work: 0.249 (Depositor), 0.250 (DCC)
R-Value Observed: 0.252 (Depositor)

Space Group: P 4₃ 2₁ 2

Unit Cell:

Length ( Å )	Angle ( ˚ )
a = 52.267	α = 90
b = 52.267	β = 90
c = 47.352	γ = 90

Software Package:

Software Name	Purpose
PHENIX	refinement
XDS	data reduction
XDS	data scaling
PHASER	phasing

Structure Validation

View Full Validation Report

Entry History & Funding Information

Deposition Data

Released Date: 2021-12-15

Deposition Author(s):

Funding Organization	Location	Grant Number
Howard Hughes Medical Institute (HHMI)	United States	--

Revision History (Full details and data files)

Version 1.0: 2021-12-15
Type: Initial release
Version 1.1: 2022-05-04
Changes: Database references
Version 1.2: 2024-04-03
Changes: Data collection, Refinement description

RCSB PDB Core Operations are funded by the U.S. National Science Foundation (DBI-2321666), the US Department of Energy (DE-SC0019749), and the National Cancer Institute, National Institute of Allergy and Infectious Diseases, and National Institute of General Medical Sciences of the National Institutes of Health under grant R01GM157729. RCSB PDB uses resources of the National Energy Research Scientific Computing Center (NERSC), a Department of Energy User Facility.