7RGR | pdb_00007rgr

Deposited: 2021-07-15 Released: 2021-07-28
Deposition Author(s): Fraser, J.S., Holton, J.M., Olmos Jr., J.L., Greene, E.R.
Funding Organization(s): National Institutes of Health/National Institute of General Medical Sciences (NIH/NIGMS)

Experimental Data Snapshot

Starting Model: in silico
View more details

wwPDB Validation 3D Report Full Report

This is version 1.2 of the entry. See complete history.

(2023) Nat Biotechnol

PubMed Abstract:
Deep-learning language models have shown promise in various biotechnological applications, including protein design and engineering. Here we describe ProGen, a language model that can generate protein sequences with a predictable function across large protein families, akin to generating grammatically and semantically correct natural language sentences on diverse topics. The model was trained on 280 million protein sequences from >19,000 families and is augmented with control tags specifying protein properties. ProGen can be further fine-tuned to curated sequences and tags to improve controllable generation performance of proteins from families with sufficient homologous samples. Artificial proteins fine-tuned to five distinct lysozyme families showed similar catalytic efficiencies as natural lysozymes, with sequence identity to natural proteins as low as 31.4%. ProGen is readily adapted to diverse protein families, as we demonstrate with chorismate mutase and malate dehydrogenase.

Organizational Affiliation

Macromolecule Content

Find similar proteins by:

| 3D Structure

Entity ID: 1
Molecule	Chains	Sequence Length	Organism	Details	Image
Artificial protein L056	A [auth B], B [auth A]	168	synthetic construct	Mutation(s): 0 EC: 3.2.1.17

Ligands 2 Unique
ID	Chains	Name / Formula / InChI Key	2D Diagram	3D Interactions
NHE Query on NHE Download:Ideal Coordinates CCD File SDF format, chain C [auth B] MOL2 format, chain C [auth B] mmCIF format, chain C [auth B]	C [auth B]	2-[N-CYCLOHEXYLAMINO]ETHANE SULFONIC ACID C₈ H₁₇ N O₃ S MKWKNSIESPFAQN-UHFFFAOYSA-N		Interactions Focus chain C [auth B] Interactions & Density Focus chain C [auth B]
CL Query on CL Download:Ideal Coordinates CCD File SDF format, chain D [auth A] MOL2 format, chain D [auth A] mmCIF format, chain D [auth A]	D [auth A]	CHLORIDE ION Cl VEXZGXHMUGYJMC-UHFFFAOYSA-M		Interactions Focus chain D [auth A] Interactions & Density Focus chain D [auth A]

Diffraction Data:

Space Group: P 2₁ 2₁ 2₁

Unit Cell:

Software Package:

& Funding Information

Deposition Author(s):

Fraser, J.S.

Funding Organization	Location	Grant Number
National Institutes of Health/National Institute of General Medical Sciences (NIH/NIGMS)	United States	GM123159
National Institutes of Health/National Institute of General Medical Sciences (NIH/NIGMS)	United States	GM124149
National Institutes of Health/National Institute of General Medical Sciences (NIH/NIGMS)	United States	GM124169