Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Evolutionary Relationships through Genome Comparisons02:54

Evolutionary Relationships through Genome Comparisons

6.1K
Genome comparison is one of the excellent ways to interpret the evolutionary relationships between organisms. The basic principle of genome comparison is that if two species share a common feature, it is likely encoded by the DNA sequence conserved between both species. The advent of genome sequencing technologies in the late 20th century enabled scientists to understand the concept of conservation of domains between species and helped them to deduce evolutionary relationships across diverse...
6.1K
Phylogenetic Trees03:21

Phylogenetic Trees

45.9K
Phylogenetic trees come in many forms. It matters in which sequence the organisms are arranged from the bottom to the top of the tree, but the branches can rotate at their nodes without altering the information. The lines connecting individual nodes can be straight, angled, or even curved.
45.9K
Phylogeny01:23

Phylogeny

45.4K
Phylogeny is concerned with the evolutionary diversification of organisms or groups of organisms. A group of organisms with a name is called a taxon (singular). Taxa (plural) can span different levels of the evolutionary hierarchy. For instance, the group containing all birds is a taxon (comprising the class Aves), and the group of all species of daisies (the genus Bellis) is a taxon. Phylogenies can likewise include just one genus (i.e., depict species relationships) or span an entire kingdom.
45.4K
Protein Families02:47

Protein Families

15.6K
Protein families are groups of homologous proteins; that is, they have similarities in amino acid sequences and three-dimensional structures. Protein families usually occur because of gene duplication, where an additional copy of a gene is inserted into the genome of an organism.   Mutations that change the amino acids but still allow the protein to be properly synthesized, will lead to new protein family members.   If these new proteins contain similar amino acids in key...
15.6K
Modern Molecular Taxonomy01:29

Modern Molecular Taxonomy

88
Advancements in molecular biology have revolutionized the identification and characterization of bacteria, with multiple methods leveraging DNA sequencing for enhanced precision. As sequencing technologies improve and costs decline, these approaches are increasingly used in clinical, environmental, and evolutionary studies.Multilocus Sequence Typing (MLST) examines several housekeeping genes, essential chromosomal genes encoding cellular functions, to distinguish strains. Approximately...
88
Multi-species Conserved Sequences02:51

Multi-species Conserved Sequences

4.0K
Next-generation sequencing technologies have created large genomic databases of a variety of animals and plants. Ever since the human genome project was completed, scientists studied the genome of primates, mammals, and other phylogenetically distant living beings. Such large-scale  studies have provided new insights into the evolutionary relationship between organisms.
Although the genome of each species varies greatly from each other, a few sequences are highly conserved. Such conserved...
4.0K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

ProteomeLM: A proteome-scale language model enables accurate and rapid prediction of protein-protein interactions and gene essentiality across taxa.

Proceedings of the National Academy of Sciences of the United States of America·2026
Same author

Environment heterogeneity creates fast amplifiers of natural selection in graph-structured populations.

Nature communications·2026
Same author

Rubisco is slow across the tree of life.

Proceedings of the National Academy of Sciences of the United States of America·2025
Same author

ProtMamba: a homology-aware but alignment-free protein state space model.

Bioinformatics (Oxford, England)·2025
Same author

Spatial structure facilitates evolutionary rescue by drug resistance.

PLoS computational biology·2025
Same author

Impact of complex spatial population structure on early and long-term adaptation in rugged fitness landscapes.

Evolution; international journal of organic evolution·2025
Same journal

Kat5 deficiency in alveolar type II cells licenses STAT6-driven glycolytic reprogramming and pulmonary fibrosis.

Nature communications·2026
Same journal

Continuous nonthermal slab gap formed by progressive tearing beneath Northeast Asia.

Nature communications·2026
Same journal

Zeolitic isolated protonic acid sites-mediated NH<sub>3</sub> storage for robust NO<sub>x</sub> removal.

Nature communications·2026
Same journal

Coaxially nested component with asymmetric fiber resonant cavity and separation membrane for gaseous and dissolved gases detection.

Nature communications·2026
Same journal

Near-unity charge readout signal in a nonlinear resonator without matching the sensor dissipation.

Nature communications·2026
Same journal

Prokaryotic Schlafen proteins cleave tRNAs during type III CRISPR immunity.

Nature communications·2026
See all related articles

Related Experiment Video

Updated: Aug 24, 2025

A Practical Guide to Phylogenetics for Nonexperts
12:00

A Practical Guide to Phylogenetics for Nonexperts

Published on: February 5, 2014

35.4K

Protein language models trained on multiple sequence alignments learn phylogenetic relationships.

Umberto Lupo1,2, Damiano Sgarbossa3,4, Anne-Florence Bitbol5,6

  • 1Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), CH-1015, Lausanne, Switzerland. umberto.lupo@epfl.ch.

Nature Communications
|October 22, 2022
PubMed
Summary
This summary is machine-generated.

Protein language models using multiple sequence alignments (MSAs) capture detailed phylogenetic relationships. These models improve contact prediction by separating evolutionary signals from historical contingency, enhancing resilience to phylogenetic noise.

More Related Videos

Using Phylogenetic Analysis to Investigate Eukaryotic Gene Origin
08:57

Using Phylogenetic Analysis to Investigate Eukaryotic Gene Origin

Published on: August 14, 2018

16.0K
Creating and Applying a Reference to Facilitate the Discussion and Classification of Proteins in a Diverse Group
07:49

Creating and Applying a Reference to Facilitate the Discussion and Classification of Proteins in a Diverse Group

Published on: August 16, 2017

7.1K

Related Experiment Videos

Last Updated: Aug 24, 2025

A Practical Guide to Phylogenetics for Nonexperts
12:00

A Practical Guide to Phylogenetics for Nonexperts

Published on: February 5, 2014

35.4K
Using Phylogenetic Analysis to Investigate Eukaryotic Gene Origin
08:57

Using Phylogenetic Analysis to Investigate Eukaryotic Gene Origin

Published on: August 14, 2018

16.0K
Creating and Applying a Reference to Facilitate the Discussion and Classification of Proteins in a Diverse Group
07:49

Creating and Applying a Reference to Facilitate the Discussion and Classification of Proteins in a Diverse Group

Published on: August 16, 2017

7.1K

Area of Science:

  • Computational Biology
  • Bioinformatics
  • Machine Learning

Background:

  • Self-supervised neural language models with attention are advancing biological sequence analysis.
  • Protein language models like MSA Transformer utilize multiple sequence alignments (MSAs) for prediction tasks.
  • Previous work showed success in contact prediction using row attentions in MSA Transformer.

Purpose of the Study:

  • To investigate whether column attentions in MSA Transformer encode phylogenetic relationships.
  • To determine if MSA-based language models can differentiate coevolutionary signals from phylogenetic noise.
  • To assess the robustness of unsupervised contact prediction against phylogenetic noise.

Main Methods:

  • Analyzing column attentions of MSA Transformer on biological sequence data.
  • Generating synthetic MSAs with and without phylogeny using Potts models.
  • Evaluating unsupervised contact prediction performance using MSA Transformer and Potts models.

Main Results:

  • Combinations of MSA Transformer's column attentions strongly correlate with Hamming distances, indicating encoding of phylogenetic relationships.
  • MSA-based language models successfully separate functional/structural coevolutionary signals from phylogenetic correlations.
  • Unsupervised contact prediction using MSA Transformer demonstrated greater resilience to phylogenetic noise compared to Potts models.

Conclusions:

  • MSA-based language models inherently encode detailed phylogenetic information.
  • These models offer a powerful approach to disentangle evolutionary signals from historical contingency in biological sequences.
  • The findings highlight the potential of protein language models for robust biological structure and function prediction.