Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Protein Networks02:26

Protein Networks

4.4K
An organism can have thousands of different proteins, and these proteins must cooperate to ensure the health of an organism. Proteins bind to other proteins and form complexes to carry out their functions. Many proteins interact with multiple other proteins creating a complex network of protein interactions.
These interactions can be represented through maps depicting protein-protein interaction networks, represented as nodes and edges. Nodes are circles that are representative of a protein,...
4.4K
Protein Networks02:26

Protein Networks

2.6K
2.6K
Protein Families02:47

Protein Families

16.4K
Protein families are groups of homologous proteins; that is, they have similarities in amino acid sequences and three-dimensional structures. Protein families usually occur because of gene duplication, where an additional copy of a gene is inserted into the genome of an organism.   Mutations that change the amino acids but still allow the protein to be properly synthesized, will lead to new protein family members.   If these new proteins contain similar amino acids in key...
16.4K
Protein-protein Interfaces02:04

Protein-protein Interfaces

14.2K
Many proteins form complexes to carry out their functions, making protein-protein interactions (PPIs) essential for an organism's survival. Most PPIs are stabilized by numerous weak noncovalent chemical forces. The physical shape of the interfaces determines the way two proteins interact. Many globular proteins have closely-matching shapes on their surfaces, which form a large number of weak bonds. Additionally, many PPIs occur between two helices or between a surface cleft and a...
14.2K
Evolutionary Relationships through Genome Comparisons02:54

Evolutionary Relationships through Genome Comparisons

6.7K
Genome comparison is one of the excellent ways to interpret the evolutionary relationships between organisms. The basic principle of genome comparison is that if two species share a common feature, it is likely encoded by the DNA sequence conserved between both species. The advent of genome sequencing technologies in the late 20th century enabled scientists to understand the concept of conservation of domains between species and helped them to deduce evolutionary relationships across diverse...
6.7K
Ligand Binding Sites02:40

Ligand Binding Sites

14.5K
Proteins are dynamic macromolecules that carry out a wide variety of essential processes; however, the activities of most proteins depend on their interactions with other molecules or ions, known as ligands.
Protein-ligand interactions are quite specific; even though numerous potential ligands surround a cellular protein at any given time, only a particular ligand can bind to that protein. Moreover, a ligand binds only to a dedicated area on the surface of the protein, known as the...
14.5K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Replicability of Functional Brain Networks: A Study Through the Lens of Seven Resting-State Networks.

Human brain mapping·2026
Same author

Scalable and cost-efficient custom gene library assembly from oligopools.

Science advances·2026
Same author

Detecting misfolded non-covalent lasso entanglements in protein structures, simulation trajectories, and mass spectrometry data.

bioRxiv : the preprint server for biology·2026
Same author

Natively entangled proteins are linked to human disease and pathogenic mutations likely due to a greater misfolding propensity.

bioRxiv : the preprint server for biology·2026
Same author

AlphaFast: High-throughput AlphaFold 3 via GPU-accelerated MSA construction.

bioRxiv : the preprint server for biology·2026
Same author

Targeted digital voter suppression efforts likely decrease voter turnout.

Proceedings of the National Academy of Sciences of the United States of America·2026
Same journal

Glycoform engineering of a mammalian platform to sculpt a humanized recombinant bioscavenger.

Cell systems·2026
Same journal

Targeted genomic editing of human gut Bacteroides species based on CRISPR-associated transposases.

Cell systems·2026
Same journal

Scalable enumeration and sampling of minimal metabolic pathways for organisms and communities.

Cell systems·2026
Same journal

Deciphering protein mutation-phenotype linkages from CRISPR-based tiling mutagenesis screens.

Cell systems·2026
Same journal

High-throughput machine learning-aided antibody discovery for cell surface antigens.

Cell systems·2026
Same journal

Quantitative cytokine profiling of primary human macrophages reveals distinct single-cell modes of trained immunity.

Cell systems·2026
See all related articles

Related Experiment Video

Updated: Nov 29, 2025

A Virtual Machine Platform for Non-Computer Professionals for Using Deep Learning to Classify Biological Sequences of Metagenomic Data
09:34

A Virtual Machine Platform for Non-Computer Professionals for Using Deep Learning to Classify Biological Sequences of Metagenomic Data

Published on: September 25, 2021

4.3K

Inferring Protein Sequence-Function Relationships with Large-Scale Positive-Unlabeled Learning.

Hyebin Song1, Bennett J Bremer2, Emily C Hinds2

  • 1Department of Statistics, The Pennsylvania State University, State College, PA 16802, USA; Department of Statistics, University of Wisconsin-Madison, Madison, WI 53706, USA.

Cell Systems
|November 19, 2020
PubMed
Summary
This summary is machine-generated.

We developed a novel machine learning approach using positive-unlabeled (PU) learning to predict protein sequence-function relationships from deep mutational scanning (DMS) data. This method accurately identifies key residues and enables the design of improved protein functions.

Keywords:
deep mutational scanningpositive-unlabeled learningprotein engineeringprotein sequence function relationshipsstatistical learningsupervised learning

More Related Videos

Optimization of Synthetic Proteins: Identification of Interpositional Dependencies Indicating Structurally and/or Functionally Linked Residues
07:08

Optimization of Synthetic Proteins: Identification of Interpositional Dependencies Indicating Structurally and/or Functionally Linked Residues

Published on: July 14, 2015

7.5K
Author Spotlight: A Computational Approach to Decipher Amino Acid Preferences in Multispecific Protein-Protein Interactions
06:50

Author Spotlight: A Computational Approach to Decipher Amino Acid Preferences in Multispecific Protein-Protein Interactions

Published on: January 26, 2024

2.3K

Related Experiment Videos

Last Updated: Nov 29, 2025

A Virtual Machine Platform for Non-Computer Professionals for Using Deep Learning to Classify Biological Sequences of Metagenomic Data
09:34

A Virtual Machine Platform for Non-Computer Professionals for Using Deep Learning to Classify Biological Sequences of Metagenomic Data

Published on: September 25, 2021

4.3K
Optimization of Synthetic Proteins: Identification of Interpositional Dependencies Indicating Structurally and/or Functionally Linked Residues
07:08

Optimization of Synthetic Proteins: Identification of Interpositional Dependencies Indicating Structurally and/or Functionally Linked Residues

Published on: July 14, 2015

7.5K
Author Spotlight: A Computational Approach to Decipher Amino Acid Preferences in Multispecific Protein-Protein Interactions
06:50

Author Spotlight: A Computational Approach to Decipher Amino Acid Preferences in Multispecific Protein-Protein Interactions

Published on: January 26, 2024

2.3K

Area of Science:

  • Computational biology
  • Protein engineering
  • Machine learning

Background:

  • Machine learning (ML) offers a powerful way to link protein sequence to function without deep mechanistic insight.
  • Supervised ML struggles with large-scale deep mutational scanning (DMS) data due to high dimensionality, correlated variables, experimental noise, missing data, and a lack of negative examples.
  • Accurately inferring sequence-function relationships is crucial for protein design and engineering.

Purpose of the Study:

  • To develop a robust machine learning framework for inferring sequence-function relationships from large-scale, complex DMS data.
  • To address the challenge of missing negative sequence data in typical DMS experiments.
  • To enable the prediction of key residues governing protein structure and function.

Main Methods:

  • Developed a novel positive-unlabeled (PU) learning framework tailored for sequence-function inference.
  • Applied the PU learning method to ten diverse large-scale DMS datasets covering various protein types and library designs.
  • Validated the predictive performance and ability to identify critical sequence determinants.

Main Results:

  • The PU learning framework demonstrated excellent predictive performance across multiple datasets.
  • The model successfully identified key residues that significantly influence protein structure and function.
  • The approach proved effective even with the inherent challenges of DMS data, such as missing negative examples.

Conclusions:

  • Positive-unlabeled (PU) learning provides an effective solution for analyzing large-scale deep mutational scanning (DMS) data.
  • The developed statistical sequence-function model can accurately pinpoint critical residues for protein engineering.
  • This methodology facilitates the design of novel proteins with enhanced stability and function, as demonstrated by the creation of highly stabilized enzymes.