7M0Q

Crystal structure of deep network hallucinated protein 0738_mod


Experimental Data Snapshot

  • Method: X-RAY DIFFRACTION
  • Resolution: 2.40 Å
  • R-Value Free: 0.260 
  • R-Value Work: 0.219 
  • R-Value Observed: 0.221 

wwPDB Validation   3D Report Full Report


This is version 1.1 of the entry. See complete history


Literature

De novo protein design by deep network hallucination.

Anishchenko, I.Pellock, S.J.Chidyausiku, T.M.Ramelot, T.A.Ovchinnikov, S.Hao, J.Bafna, K.Norn, C.Kang, A.Bera, A.K.DiMaio, F.Carter, L.Chow, C.M.Montelione, G.T.Baker, D.

(2021) Nature 600: 547-552

  • DOI: https://doi.org/10.1038/s41586-021-04184-w
  • Primary Citation of Related Structures:  
    7K3H, 7M0Q, 7M5T

  • PubMed Abstract: 

    There has been considerable recent progress in protein structure prediction using deep neural networks to predict inter-residue distances from amino acid sequences 1-3 . Here we investigate whether the information captured by such networks is sufficiently rich to generate new folded proteins with sequences unrelated to those of the naturally occurring proteins used in training the models. We generate random amino acid sequences, and input them into the trRosetta structure prediction network to predict starting residue-residue distance maps, which, as expected, are quite featureless. We then carry out Monte Carlo sampling in amino acid sequence space, optimizing the contrast (Kullback-Leibler divergence) between the inter-residue distance distributions predicted by the network and background distributions averaged over all proteins. Optimization from different random starting points resulted in novel proteins spanning a wide range of sequences and predicted structures. We obtained synthetic genes encoding 129 of the network-'hallucinated' sequences, and expressed and purified the proteins in Escherichia coli; 27 of the proteins yielded monodisperse species with circular dichroism spectra consistent with the hallucinated structures. We determined the three-dimensional structures of three of the hallucinated proteins, two by X-ray crystallography and one by NMR, and these closely matched the hallucinated models. Thus, deep networks trained to predict native protein structures from their sequences can be inverted to design new proteins, and such networks and methods should contribute alongside traditional physics-based models to the de novo design of proteins with new functions.


  • Organizational Affiliation

    Department of Biochemistry, University of Washington, Seattle, WA, USA.


Macromolecules
Find similar proteins by:  (by identity cutoff)  |  3D Structure
Entity ID: 1
MoleculeChains Sequence LengthOrganismDetailsImage
Network hallucinated protein 0738_mod
A, B
121synthetic constructMutation(s): 0 
Entity Groups  
Sequence Clusters30% Identity50% Identity70% Identity90% Identity95% Identity100% Identity
Sequence Annotations
Expand
  • Reference Sequence
Experimental Data & Validation

Experimental Data

  • Method: X-RAY DIFFRACTION
  • Resolution: 2.40 Å
  • R-Value Free: 0.260 
  • R-Value Work: 0.219 
  • R-Value Observed: 0.221 
  • Space Group: P 31
Unit Cell:
Length ( Å )Angle ( ˚ )
a = 46.291α = 90
b = 46.291β = 90
c = 82.492γ = 120
Software Package:
Software NamePurpose
PHENIXrefinement
ADSCdata collection
autoPROCdata processing
SCALEPACKdata scaling
PHASERphasing
HKL-2000data reduction

Structure Validation

View Full Validation Report



Entry History & Funding Information

Deposition Data


Funding OrganizationLocationGrant Number
Department of Defense (DOD, United States)United States--

Revision History  (Full details and data files)

  • Version 1.0: 2021-12-29
    Type: Initial release
  • Version 1.1: 2024-04-03
    Changes: Data collection, Refinement description