7KUW

High-throughput design and refinement of stable proteins using sequence-only models


Experimental Data Snapshot

  • Method: X-RAY DIFFRACTION
  • Resolution: 2.43 Å
  • R-Value Free: 0.281 
  • R-Value Work: 0.249 
  • R-Value Observed: 0.252 

wwPDB Validation   3D Report Full Report


This is version 1.2 of the entry. See complete history


Literature

Large-scale design and refinement of stable proteins using sequence-only models.

Singer, J.M.Novotney, S.Strickland, D.Haddox, H.K.Leiby, N.Rocklin, G.J.Chow, C.M.Roy, A.Bera, A.K.Motta, F.C.Cao, L.Strauch, E.M.Chidyausiku, T.M.Ford, A.Ho, E.Zaitzeff, A.Mackenzie, C.O.Eramian, H.DiMaio, F.Grigoryan, G.Vaughn, M.Stewart, L.J.Baker, D.Klavins, E.

(2022) PLoS One 17: e0265020-e0265020

  • DOI: https://doi.org/10.1371/journal.pone.0265020
  • Primary Citation of Related Structures:  
    7KUW

  • PubMed Abstract: 

    Engineered proteins generally must possess a stable structure in order to achieve their designed function. Stable designs, however, are astronomically rare within the space of all possible amino acid sequences. As a consequence, many designs must be tested computationally and experimentally in order to find stable ones, which is expensive in terms of time and resources. Here we use a high-throughput, low-fidelity assay to experimentally evaluate the stability of approximately 200,000 novel proteins. These include a wide range of sequence perturbations, providing a baseline for future work in the field. We build a neural network model that predicts protein stability given only sequences of amino acids, and compare its performance to the assayed values. We also report another network model that is able to generate the amino acid sequences of novel stable proteins given requested secondary sequences. Finally, we show that the predictive model-despite weaknesses including a noisy data set-can be used to substantially increase the stability of both expert-designed and model-generated proteins.


  • Organizational Affiliation

    Two Six Technologies, Arlington, Virginia, United States of America.


Macromolecules
Find similar proteins by:  (by identity cutoff)  |  3D Structure
Entity ID: 1
MoleculeChains Sequence LengthOrganismDetailsImage
Sequence-Based Designed Protein nmt_0994_guided_0262synthetic constructMutation(s): 0 
Entity Groups  
Sequence Clusters30% Identity50% Identity70% Identity90% Identity95% Identity100% Identity
Sequence Annotations
Expand
  • Reference Sequence
Experimental Data & Validation

Experimental Data

  • Method: X-RAY DIFFRACTION
  • Resolution: 2.43 Å
  • R-Value Free: 0.281 
  • R-Value Work: 0.249 
  • R-Value Observed: 0.252 
  • Space Group: P 43 21 2
Unit Cell:
Length ( Å )Angle ( ˚ )
a = 52.267α = 90
b = 52.267β = 90
c = 47.352γ = 90
Software Package:
Software NamePurpose
PHENIXrefinement
XDSdata reduction
XDSdata scaling
PHASERphasing

Structure Validation

View Full Validation Report



Entry History & Funding Information

Deposition Data


Funding OrganizationLocationGrant Number
Howard Hughes Medical Institute (HHMI)United States--

Revision History  (Full details and data files)

  • Version 1.0: 2021-12-15
    Type: Initial release
  • Version 1.1: 2022-05-04
    Changes: Database references
  • Version 1.2: 2024-04-03
    Changes: Data collection, Refinement description