Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Conservation of Protein Domains Over Different Proteins02:26

Conservation of Protein Domains Over Different Proteins

13.1K
Protein domains are small structurally independent units that are part of a single amino acid chain.  Although these domains are often structurally independent, they may rely on synergistic effects to perform their functions as part of a larger protein. Protein domains may be conserved within the same organism, as well as across different organisms.
A limited set of protein domains often duplicate and recombine during evolution. These domains can be organized in different combinations to...
13.1K
Conserved Binding Sites01:49

Conserved Binding Sites

4.6K
Many proteins’ biological role depends on their interactions with their ligands, small molecules that bind to specific locations on the protein known as ligand-binding sites. Ligand-binding sites are often conserved among homologous proteins as these sites are critical for protein function.
Binding sites are often located in large pockets, and if their location on a protein’s surface is unknown, it can be predicted using various approaches. The energetic method computationally...
4.6K
Conservation of Protein Domains02:26

Conservation of Protein Domains

3.3K
3.3K
Multi-species Conserved Sequences02:51

Multi-species Conserved Sequences

4.3K
Next-generation sequencing technologies have created large genomic databases of a variety of animals and plants. Ever since the human genome project was completed, scientists studied the genome of primates, mammals, and other phylogenetically distant living beings. Such large-scale  studies have provided new insights into the evolutionary relationship between organisms.
Although the genome of each species varies greatly from each other, a few sequences are highly conserved. Such conserved...
4.3K
Leaky Scanning02:28

Leaky Scanning

5.3K
During most eukaryotic translation processes, the small 40S ribosome subunit scans an mRNA from its 5' end until it encounters the first start AUG codon. The large 60S ribosomal subunit then joins the smaller one to initiate protein synthesis. The location of the translation initiation is largely determined by the nucleotides near the start codon as there may be multiple translation initiation sites present on the mRNA.  Marilyn Kozak discovered that the sequence RCCAUGG (where R...
5.3K
Protein and Protein Structure02:15

Protein and Protein Structure

82.7K
Proteins are one of the most abundant organic molecules in living systems and have the most diverse range of functions of all macromolecules. Proteins may be structural, regulatory, contractile, or protective. They may serve in transport, storage, or membranes; or they may be toxins or enzymes. Their structures, like their functions, vary greatly. They are all, however, amino acid polymers arranged in a linear sequence.
A protein's shape is critical to its function. For example, an enzyme...
82.7K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

On the state of protein function prediction: a report on the fourth CAFA challenge.

bioRxiv : the preprint server for biology·2026
Same author

Advances in Protein Function Prediction from the Fifth CAFA Challenge.

bioRxiv : the preprint server for biology·2026
Same author

Whole-genome prediction of bacterial pathogenic capacity on novel bacteria using protein language models with PathogenFinder2.

Bioinformatics (Oxford, England)·2026
Same author

Biocentral: Embedding-based Protein Predictions.

Journal of molecular biology·2026
Same author

Protein structure-informed bacteriophage genome annotation with Phold.

Nucleic acids research·2026
Same author

Toxin data quality: a critical examination of bacterial exotoxins and animal toxins.

BMC research notes·2025
Same journal

AI in variant analysis: fast track to genetic diagnoses.

Human genetics·2026
Same journal

Combined family-based association and linkage analyses in families affected by attention-deficit hyperactivity disorder.

Human genetics·2026
Same journal

Investigating the shared genetic architecture between selective immunoglobulin A deficiency and autoimmune diseases.

Human genetics·2026
Same journal

ARHI as a key regulator of EMT and metastasis in pancreatic cancer via the Notch-1 pathway.

Human genetics·2026
Same journal

Large-scale mitogenome analysis reveals complex maternal genetic connections between Sino-Tibetan- and Altaic-speaking populations.

Human genetics·2026
Same journal

Correction: A comprehensive and accessible model for co-segregation analysis in BRCA1, BRCA2, and PALB2 variant classification.

Human genetics·2026
See all related articles

Related Experiment Video

Updated: Oct 8, 2025

Identification and Classification of Position-specific GABAA Receptor Subunit Missense Variants for Their Role In Hippocampal Pyramidal Neurons
08:04

Identification and Classification of Position-specific GABAA Receptor Subunit Missense Variants for Their Role In Hippocampal Pyramidal Neurons

Published on: June 6, 2025

597

Embeddings from protein language models predict conservation and variant effects.

Céline Marquet1,2, Michael Heinzinger3,4, Tobias Olenyi3,4

  • 1Department of Informatics, Bioinformatics and Computational Biology - i12, TUM-Technical University of Munich, Boltzmannstr. 3, Garching, 85748, Munich, Germany. celine.marquet@tum.de.

Human Genetics
|December 30, 2021
PubMed
Summary
This summary is machine-generated.

Protein Language Models (pLMs) now predict single amino acid variant effects on protein function without multiple sequence alignments (MSAs). This new method, VESPA, is competitive with state-of-the-art approaches, offering faster and more efficient variant effect predictions.

More Related Videos

Optimization of Synthetic Proteins: Identification of Interpositional Dependencies Indicating Structurally and/or Functionally Linked Residues
07:08

Optimization of Synthetic Proteins: Identification of Interpositional Dependencies Indicating Structurally and/or Functionally Linked Residues

Published on: July 14, 2015

7.4K
A Protocol for Computer-Based Protein Structure and Function Prediction
16:41

A Protocol for Computer-Based Protein Structure and Function Prediction

Published on: November 3, 2011

69.1K

Related Experiment Videos

Last Updated: Oct 8, 2025

Identification and Classification of Position-specific GABAA Receptor Subunit Missense Variants for Their Role In Hippocampal Pyramidal Neurons
08:04

Identification and Classification of Position-specific GABAA Receptor Subunit Missense Variants for Their Role In Hippocampal Pyramidal Neurons

Published on: June 6, 2025

597
Optimization of Synthetic Proteins: Identification of Interpositional Dependencies Indicating Structurally and/or Functionally Linked Residues
07:08

Optimization of Synthetic Proteins: Identification of Interpositional Dependencies Indicating Structurally and/or Functionally Linked Residues

Published on: July 14, 2015

7.4K
A Protocol for Computer-Based Protein Structure and Function Prediction
16:41

A Protocol for Computer-Based Protein Structure and Function Prediction

Published on: November 3, 2011

69.1K

Area of Science:

  • Computational Biology
  • Bioinformatics
  • Protein Engineering

Background:

  • Emerging SARS-CoV-2 variants necessitate tools for predicting single amino acid variant (SAV) effects on protein function.
  • Deep Mutational Scanning (DMS) provides extensive data but poses analytical challenges.
  • Protein Language Models (pLMs) leverage deep learning and large sequence databases to understand protein variations.

Purpose of the Study:

  • To develop and evaluate a novel method for predicting SAV effects on protein function using pLM embeddings without multiple sequence alignments (MSAs).
  • To assess the performance of this new approach against existing state-of-the-art (SOTA) methods.
  • To demonstrate the computational efficiency and broad applicability of the developed method.

Main Methods:

  • Utilized pLM representations (embeddings) to predict sequence conservation and SAV effects, bypassing the need for MSAs.
  • Developed the Variant Effect Score Prediction without Alignments (VESPA) model, an ensemble logistic regression integrating conservation predictions, BLOSUM62 scores, and pLM mask reconstruction probabilities.
  • Compared VESPA's predictions against established methods (ESM-1v, DeepSequence, GEMME) using a standard set of 39 DMS experiments.

Main Results:

  • pLM embeddings alone achieved residue conservation prediction accuracy comparable to MSA-based methods like ConSeq.
  • VESPA accurately predicted SAV effect magnitude without prior optimization on DMS data.
  • VESPA demonstrated competitive performance against SOTA MSA-based methods across various metrics (Spearman, Pearson correlation).
  • The embedding-based approach significantly reduced computational and energy costs compared to MSA-dependent methods.
  • Predicted SAV effects for the entire human proteome in under 40 minutes on a single GPU.

Conclusions:

  • pLM embeddings offer a powerful and efficient alternative to MSAs for predicting SAV effects on protein function.
  • VESPA provides a competitive and computationally inexpensive SOTA solution for variant effect prediction.
  • The developed methods and datasets are publicly available, facilitating broader research and application.