Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Conservation of Protein Domains Over Different Proteins02:26

Conservation of Protein Domains Over Different Proteins

10.9K
Protein domains are small structurally independent units that are part of a single amino acid chain.  Although these domains are often structurally independent, they may rely on synergistic effects to perform their functions as part of a larger protein. Protein domains may be conserved within the same organism, as well as across different organisms.
A limited set of protein domains often duplicate and recombine during evolution. These domains can be organized in different combinations to...
10.9K
Conservation of Protein Domains02:26

Conservation of Protein Domains

3.1K
3.1K
Improving Translational Accuracy02:07

Improving Translational Accuracy

11.7K
Base complementarity between the three base pairs of mRNA codon and the tRNA anticodon is not a failsafe mechanism. Inaccuracies can range from a single mismatch to no correct base pairing at all. The free energy difference between the correct and nearly correct base pairs can be as small as 3 kcal/ mol. With complementarity being the only proofreading step, the estimated error frequency would be one wrong amino acid in every 100 amino acids incorporated. However, error frequencies observed in...
11.7K
Multi-species Conserved Sequences02:51

Multi-species Conserved Sequences

4.0K
Next-generation sequencing technologies have created large genomic databases of a variety of animals and plants. Ever since the human genome project was completed, scientists studied the genome of primates, mammals, and other phylogenetically distant living beings. Such large-scale  studies have provided new insights into the evolutionary relationship between organisms.
Although the genome of each species varies greatly from each other, a few sequences are highly conserved. Such conserved...
4.0K
Leaky Scanning02:28

Leaky Scanning

5.2K
During most eukaryotic translation processes, the small 40S ribosome subunit scans an mRNA from its 5' end until it encounters the first start AUG codon. The large 60S ribosomal subunit then joins the smaller one to initiate protein synthesis. The location of the translation initiation is largely determined by the nucleotides near the start codon as there may be multiple translation initiation sites present on the mRNA.  Marilyn Kozak discovered that the sequence RCCAUGG (where R...
5.2K
Protein and Protein Structure02:15

Protein and Protein Structure

79.8K
Proteins are one of the most abundant organic molecules in living systems and have the most diverse range of functions of all macromolecules. Proteins may be structural, regulatory, contractile, or protective. They may serve in transport, storage, or membranes; or they may be toxins or enzymes. Their structures, like their functions, vary greatly. They are all, however, amino acid polymers arranged in a linear sequence.
A protein's shape is critical to its function. For example, an enzyme...
79.8K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Fast, accurate construction of multiple sequence alignments from protein language embeddings.

bioRxiv : the preprint server for biology·2026
Same author

Evaluating Pretrained Protein Language Model Embeddings as Proxies for Functional Similarity.

Journal of molecular evolution·2025
Same author

Locality-aware pooling enhances protein language model performance across varied applications.

Bioinformatics (Oxford, England)·2025
Same author

VerteBrain reveals novel neural and non-neural protein assemblies conserved across vertebrate evolution.

bioRxiv : the preprint server for biology·2025
Same author

RECOMB 2024 Special Issue.

Journal of computational biology : a journal of computational molecular cell biology·2024
Same author

Alternative proteoforms and proteoform-dependent assemblies in humans and plants.

Molecular systems biology·2024
Same journal

Spatially informed reference-free cell-type deconvolution for spatial transcriptomics with SpatialCD.

Genome research·2026
Same journal

Spatially resolved profiling of steroid nuclear receptors reveals a role for the disordered N-terminal domains in genome targeting and AP-1 interaction.

Genome research·2026
Same journal

Flexible and scalable inference of spatially varying correlation in spatial transcriptomics with spCorr.

Genome research·2026
Same journal

The Topological Regulatory Logic of noncoding RNA-mediated gene expression.

Genome research·2026
Same journal

Influence of <i>cis</i>-regulatory elements on expression divergence in human segmental duplications.

Genome research·2026
Same journal

MAPLE enables ultrasensitive detection of low-frequency cfDNA methylation haplotypes using short capture probes with cost-efficient performance.

Genome research·2026
See all related articles

Related Experiment Video

Updated: Jul 24, 2025

A Protocol for Computer-Based Protein Structure and Function Prediction
16:41

A Protocol for Computer-Based Protein Structure and Function Prediction

Published on: November 3, 2011

68.8K

Leveraging protein language models for accurate multiple sequence alignments.

Claire D McWhite1, Isabel Armour-Garb2,3, Mona Singh1,3

  • 1Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey 08544, USA; cmcwhite@princeton.edu mona@cs.princeton.edu.

Genome Research
|July 6, 2023
PubMed
Summary
This summary is machine-generated.

This study introduces a new multiple sequence alignment (MSA) method using protein language models. It achieves higher accuracy for proteins with low sequence identity by analyzing amino acid embeddings, bypassing traditional alignment steps.

More Related Videos

Optimization of Synthetic Proteins: Identification of Interpositional Dependencies Indicating Structurally and/or Functionally Linked Residues
07:08

Optimization of Synthetic Proteins: Identification of Interpositional Dependencies Indicating Structurally and/or Functionally Linked Residues

Published on: July 14, 2015

7.3K
A Practical Guide to Phylogenetics for Nonexperts
12:00

A Practical Guide to Phylogenetics for Nonexperts

Published on: February 5, 2014

35.4K

Related Experiment Videos

Last Updated: Jul 24, 2025

A Protocol for Computer-Based Protein Structure and Function Prediction
16:41

A Protocol for Computer-Based Protein Structure and Function Prediction

Published on: November 3, 2011

68.8K
Optimization of Synthetic Proteins: Identification of Interpositional Dependencies Indicating Structurally and/or Functionally Linked Residues
07:08

Optimization of Synthetic Proteins: Identification of Interpositional Dependencies Indicating Structurally and/or Functionally Linked Residues

Published on: July 14, 2015

7.3K
A Practical Guide to Phylogenetics for Nonexperts
12:00

A Practical Guide to Phylogenetics for Nonexperts

Published on: February 5, 2014

35.4K

Area of Science:

  • Bioinformatics
  • Computational Biology
  • Structural Biology

Background:

  • Multiple sequence alignment (MSA) is crucial for understanding protein sequence and function.
  • Traditional MSA methods struggle with proteins exhibiting low sequence identity (the twilight zone).
  • Protein language models offer a novel approach by generating contextual embeddings that capture amino acid properties.

Purpose of the Study:

  • To develop a novel multiple sequence alignment (MSA) method leveraging protein language models.
  • To improve alignment accuracy for proteins with low sequence identity.
  • To circumvent limitations of traditional MSA algorithms.

Main Methods:

  • Clustering and ordering of amino acid contextual embeddings derived from protein language models.
  • Development of a novel MSA approach based on semantic consistency of protein groups.
  • Avoidance of traditional MSA components like guide trees, pairwise alignments, gap penalties, and substitution matrices.

Main Results:

  • The novel MSA method demonstrates higher accuracy for structurally similar proteins with low amino-acid similarity.
  • The approach effectively utilizes information from contextual embeddings.
  • Successful alignment of protein groups based on semantic consistency.

Conclusions:

  • Protein language models provide a powerful new source of information for MSA.
  • The proposed method offers a more accurate alternative for aligning challenging protein sets.
  • This approach is anticipated to be a fundamental component of future MSA algorithms.