Unsupervised evolution of protein and antibody complexes with a structure-informed language model

Affiliations
  • 1Stanford Biophysics Program, Stanford University School of Medicine, Stanford, CA 94305, USA.
  • 2Stanford Medical Scientist Training Program, Stanford University School of Medicine, Stanford, CA 94305, USA.
  • 3Sarafan ChEM-H, Stanford University, Stanford, CA 94305, USA.
  • 4Department of Biochemistry, Stanford University School of Medicine, Stanford, CA 94305, USA.
  • 5Chan Zuckerberg Biohub, San Francisco, CA 94158, USA.

Published on:

Abstract

Large language models trained on sequence information alone can learn high-level principles of protein design. However, beyond sequence, the three-dimensional structures of proteins determine their specific function, activity, and evolvability. Here, we show that a general protein language model augmented with protein structure backbone coordinates can guide evolution for diverse proteins without the need to model individual functional tasks. We also demonstrate that ESM-IF1, which was only trained on single-chain structures, can be extended to engineer protein complexes. Using this approach, we screened about 30 variants of two therapeutic clinical antibodies used to treat severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection. We achieved up to 25-fold improvement in neutralization and 37-fold improvement in affinity against antibody-escaped viral variants of concern BQ.1.1 and XBB.1.5, respectively. These findings highlight the advantage of integrating structural information to identify efficient protein evolution trajectories without requiring any task-specific training data.

Related Concept Videos

JoVE Research Video for Protein Organization 01:24

5.4K

Proteins are polymers of amino acid residues. They are versatile and responsible for different cellular functions, including DNA replication, molecular transport, catalysis, and structural support. Proteins have a hierarchical structure comprising at least three levels of organization: primary, secondary, and tertiary structure. Some large proteins have a quaternary structure where individual protein subunits are linked together.
The primary structure of a protein is its amino acid sequence….

JoVE Research Video for Conservation of Protein Domains Over Different Proteins 02:26

10.0K

Protein domains are small structurally independent units that are part of a single amino acid chain.  Although these domains are often structurally independent, they may rely on synergistic effects to perform their functions as part of a larger protein. Protein domains may be conserved within the same organism, as well as across different organisms.
A limited set of protein domains often duplicate and recombine during evolution. These domains can be organized in different combinations to…

JoVE Research Video for Antibody Structure and Classes 01:25

467

Antibodies, also known as immunoglobulins, are produced by B cells in response to foreign substances, such as bacteria and viruses. These proteins are critical for recognizing and neutralizing these substances, protecting the body from potential harm.
The basic structure of an antibody consists of four protein chains: two identical heavy chains and two identical light chains. These chains are held together by disulfide bonds and other non-covalent interactions, forming a Y-shaped structure.

JoVE Research Video for Protein Complexes with Interchangeable Parts 01:57

1.8K

Large language models trained on sequence information alone can learn high-level principles of protein design. However, beyond sequence, the three-dimensional structures of proteins determine their specific function, activity, and evolvability. Here, we show that a general protein language model augmented with protein structure backbone coordinates can guide evolution for diverse proteins without the need to […]

JoVE Research Video for Antibody Structure 00:00

13.3K

Large language models trained on sequence information alone can learn high-level principles of protein design. However, beyond sequence, the three-dimensional structures of proteins determine their specific function, activity, and evolvability. Here, we show that a general protein language model augmented with protein structure backbone coordinates can guide evolution for diverse proteins without the need to […]

JoVE Research Video for Conservation of Protein Domains 02:26

3.0K

Large language models trained on sequence information alone can learn high-level principles of protein design. However, beyond sequence, the three-dimensional structures of proteins determine their specific function, activity, and evolvability. Here, we show that a general protein language model augmented with protein structure backbone coordinates can guide evolution for diverse proteins without the need to […]