The simplicity of protein sequence-function relationships

Affiliations
  • 1Committee on Genetics, Genomics, and Systems Biology, University of Chicago, Chicago, IL, USA.
  • 2Center for RNA Research, Institute for Basic Science, Seoul, Republic of Korea.
  • 3Department of Ecology and Evolution, University of Chicago, Chicago, IL, USA.
  • 4Department of Biological Sciences, Purdue University, West Lafayette, IN, USA.
  • 5Department of Ecology and Evolution, University of Chicago, Chicago, IL, USA. joet1@uchicago.edu.
  • 6Department of Human Genetics, University of Chicago, Chicago, IL, USA. joet1@uchicago.edu.

|

Abstract

How complex are the rules by which a protein’s sequence determines its function? High-order epistatic interactions among residues are thought to be pervasive, suggesting an idiosyncratic and unpredictable sequence-function relationship. But many prior studies may have overestimated epistasis, because they analyzed sequence-function relationships relative to a single reference sequence-which causes measurement noise and local idiosyncrasies to snowball into high-order epistasis-or they did not fully account for global nonlinearities. Here we present a reference-free method that jointly infers specific epistatic interactions and global nonlinearity using a bird’s-eye view of sequence space. This technique yields the simplest explanation of sequence-function relationships and is more robust than existing methods to measurement noise, missing data, and model misspecification. We reanalyze 20 experimental datasets and find that context-independent amino acid effects and pairwise interactions, along with a simple nonlinearity to account for limited dynamic range, explain a median of 96% of phenotypic variance and over 92% in every case. Only a tiny fraction of genotypes are strongly affected by higher-order epistasis. Sequence-function relationships are also sparse: a miniscule fraction of amino acids and interactions account for 90% of phenotypic variance. Sequence-function causality across these datasets is therefore simple, opening the way for tractable approaches to characterize proteins’ genetic architecture.

Related Concept Videos

JoVE Research Video for Protein Families 02:47

14.7K

Protein families are groups of homologous proteins; that is, they have similarities in amino acid sequences and three-dimensional structures. Protein families usually occur because of gene duplication, where an additional copy of a gene is inserted into the genome of an organism.   Mutations that change the amino acids but still allow the protein to be properly synthesized, will lead to new protein family members.   If these new proteins contain similar amino acids in key…

JoVE Research Video for Protein Folding 01:22

113.1K

Overview

Proteins are chains of amino acids linked together by peptide bonds. Upon synthesis, a protein folds into a three-dimensional conformation which is critical to its biological function. Interactions between its constituent amino acids guide protein folding, and hence the protein structure is primarily dependent on its amino acid sequence.

Protein Structure Is Critical to Its Biological Function

Proteins perform a wide range of biological functions such as catalyzing chemical…

JoVE Research Video for Protein and Protein Structure 02:15

73.0K

Proteins are one of the most abundant organic molecules in living systems and have the most diverse range of functions of all macromolecules. Proteins may be structural, regulatory, contractile, or protective. They may serve in transport, storage, or membranes; or they may be toxins or enzymes. Their structures, like their functions, vary greatly. They are all, however, amino acid polymers arranged in a linear sequence.
A protein's shape is critical to its function. For example, an enzyme…

JoVE Research Video for Protein Organization 01:24

5.4K

Proteins are polymers of amino acid residues. They are versatile and responsible for different cellular functions, including DNA replication, molecular transport, catalysis, and structural support. Proteins have a hierarchical structure comprising at least three levels of organization: primary, secondary, and tertiary structure. Some large proteins have a quaternary structure where individual protein subunits are linked together.
The primary structure of a protein is its amino acid sequence….

JoVE Research Video for Conserved Binding Sites 01:49

4.0K

Many proteins’ biological role depends on their interactions with their ligands, small molecules that bind to specific locations on the protein known as ligand-binding sites. Ligand-binding sites are often conserved among homologous proteins as these sites are critical for protein function.
Binding sites are often located in large pockets, and if their location on a protein’s surface is unknown, it can be predicted using various approaches. The energetic method computationally…

JoVE Research Video for Protein Networks 02:26

3.9K

An organism can have thousands of different proteins, and these proteins must cooperate to ensure the health of an organism. Proteins bind to other proteins and form complexes to carry out their functions. Many proteins interact with multiple other proteins creating a complex network of protein interactions.
These interactions can be represented through maps depicting protein-protein interaction networks, represented as nodes and edges. Nodes are circles that are representative of a protein,…