Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Phylogenetic Trees03:21

Phylogenetic Trees

45.0K
Phylogenetic trees come in many forms. It matters in which sequence the organisms are arranged from the bottom to the top of the tree, but the branches can rotate at their nodes without altering the information. The lines connecting individual nodes can be straight, angled, or even curved.
45.0K
Phylogeny01:23

Phylogeny

43.5K
Phylogeny is concerned with the evolutionary diversification of organisms or groups of organisms. A group of organisms with a name is called a taxon (singular). Taxa (plural) can span different levels of the evolutionary hierarchy. For instance, the group containing all birds is a taxon (comprising the class Aves), and the group of all species of daisies (the genus Bellis) is a taxon. Phylogenies can likewise include just one genus (i.e., depict species relationships) or span an entire kingdom.
43.5K
Evolutionary Relationships through Genome Comparisons02:54

Evolutionary Relationships through Genome Comparisons

5.7K
Genome comparison is one of the excellent ways to interpret the evolutionary relationships between organisms. The basic principle of genome comparison is that if two species share a common feature, it is likely encoded by the DNA sequence conserved between both species. The advent of genome sequencing technologies in the late 20th century enabled scientists to understand the concept of conservation of domains between species and helped them to deduce evolutionary relationships across diverse...
5.7K
Gene Evolution - Fast or Slow?02:05

Gene Evolution - Fast or Slow?

7.0K
The genomes of eukaryotes are punctuated by long stretches of sequence which do not code for proteins or RNAs. Although some of these regions do contain crucial regulatory sequences, the vast majority of this DNA serves no known function. Typically, these regions of the genome are the ones in which the fastest change, in evolutionary terms, is observed, because there is typically little to no selection pressure acting on these regions to preserve their sequences.
In contrast, regions which code...
7.0K
The Tree of Life - Bacteria, Archaea, Eukaryotes02:40

The Tree of Life - Bacteria, Archaea, Eukaryotes

31.9K
The “tree of life” describes the evolution of life and the evolutionary relationships between organisms. The root of the tree is the common ancestor to all life on Earth. All other species radiate from this point, much like the branches of a tree. The numerous tips of these branches on the tree of life represent every living, or extant, species. Extinct species, which are species that no longer exist, can be found towards the center of the tree. Currently, these organisms, both...
31.9K
Protein Families02:47

Protein Families

15.2K
Protein families are groups of homologous proteins; that is, they have similarities in amino acid sequences and three-dimensional structures. Protein families usually occur because of gene duplication, where an additional copy of a gene is inserted into the genome of an organism.   Mutations that change the amino acids but still allow the protein to be properly synthesized, will lead to new protein family members.   If these new proteins contain similar amino acids in key...
15.2K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Ancestral Protein Reconstruction Uncovers a Thermotolerant Rieske Oxygenase with Enhanced <i>O</i>-Demethylation Activity toward 3-<i>O</i>-Methylgallate.

ACS synthetic biology·2026
Same author

Thermostable ancestors enabled evolutionary diversification of promiscuous chemical defence enzymes.

The EMBO journal·2026
Same author

Complete structures of the YenTc holotoxin prepore and pore reveal the evolutionary basis for chitinase incorporation into ABC toxins.

Nature communications·2025
Same author

Conserved facultative heterochromatin across cell types identify regulatory sequences underpinning cell identity and disease.

Nucleic acids research·2025
Same author

Comprehensive molecular impact mapping of common and rare variants at GWAS loci.

bioRxiv : the preprint server for biology·2025
Same author

Atlas of multilineage stem cell differentiation reveals TMEM88 as a developmental regulator of blood pressure.

Nature communications·2025
Same journal

Optimal transport for label transfer in single-cell multi-omics integration.

Briefings in bioinformatics·2026
Same journal

Continuous multi-omics pathway enrichment analysis resolves hidden functional heterogeneity.

Briefings in bioinformatics·2026
Same journal

Evaluating completeness, coherence, and consistency of genome-scale function annotations.

Briefings in bioinformatics·2026
Same journal

Transformers for single-cell RNA sequencing: a survey.

Briefings in bioinformatics·2026
Same journal

CLABP: a contrastive learning framework integrating protein language models and structural information for antibacterial peptide prediction.

Briefings in bioinformatics·2026
Same journal

Toward the regularization of E value from BLAST similarity search into a dissimilarity measure as distance function, and the metrication of protein sequence space.

Briefings in bioinformatics·2026
See all related articles

Related Experiment Video

Updated: May 26, 2025

A Practical Guide to Phylogenetics for Nonexperts
12:00

A Practical Guide to Phylogenetics for Nonexperts

Published on: February 5, 2014

35.3K

Do protein language models learn phylogeny?

Sanjana Tule1, Gabriel Foley1, Mikael Bodén1

  • 1School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, QLD 4072, Australia.

Briefings in Bioinformatics
|February 23, 2025
PubMed
Summary
This summary is machine-generated.

Protein language models (pLMs) like ESM2 can infer evolutionary relationships from protein sequences, mirroring classical phylogenetic methods. These models excel with divergent sequences and offer a complementary approach to phylogenetics, especially for complex evolutionary histories.

Keywords:
explainable artificial intelligencephylogenyprotein language modelssequence analysis

More Related Videos

Using Phylogenetic Analysis to Investigate Eukaryotic Gene Origin
08:57

Using Phylogenetic Analysis to Investigate Eukaryotic Gene Origin

Published on: August 14, 2018

15.8K
A Bioinformatics Pipeline for Investigating Molecular Evolution and Gene Expression using RNA-seq
07:09

A Bioinformatics Pipeline for Investigating Molecular Evolution and Gene Expression using RNA-seq

Published on: May 28, 2021

9.4K

Related Experiment Videos

Last Updated: May 26, 2025

A Practical Guide to Phylogenetics for Nonexperts
12:00

A Practical Guide to Phylogenetics for Nonexperts

Published on: February 5, 2014

35.3K
Using Phylogenetic Analysis to Investigate Eukaryotic Gene Origin
08:57

Using Phylogenetic Analysis to Investigate Eukaryotic Gene Origin

Published on: August 14, 2018

15.8K
A Bioinformatics Pipeline for Investigating Molecular Evolution and Gene Expression using RNA-seq
07:09

A Bioinformatics Pipeline for Investigating Molecular Evolution and Gene Expression using RNA-seq

Published on: May 28, 2021

9.4K

Area of Science:

  • Computational Biology
  • Bioinformatics
  • Machine Learning

Background:

  • Deep machine learning models, particularly protein language models (pLMs), show promise in analyzing protein sequences.
  • Classical phylogenetic tree inference relies on evolutionary relationships derived from sequence data.
  • The integration of machine learning with traditional phylogenetics is an emerging area of research.

Purpose of the Study:

  • To assess the ability of protein language models (pLMs) to discern phylogenetic relationships without explicit training.
  • To compare the performance of pLMs (ESM2, ProtTrans, MSA-Transformer) against classical phylogenetic methods.
  • To investigate the impact of sequence insertions and deletions (indels) on pLM performance in phylogenetic analysis.

Main Methods:

  • Evaluation of ESM2, ProtTrans, and MSA-Transformer on 114 Pfam datasets.
  • Comparison with established phylogenetic inference techniques.
  • Analysis of performance across varying levels of sequence insertions and deletions (indels).

Main Results:

  • The largest ESM2 model demonstrated superior performance in recovering phylogenetic relationships across diverse datasets and indel levels.
  • pLMs generally agree with classical methods, with higher concordance observed in protein families exhibiting fewer indels.
  • pLMs capture broader evolutionary relationships, with ESM2 showing particular strength in analyzing highly divergent sequences.

Conclusions:

  • Protein language models, especially ESM2, can effectively infer evolutionary relationships and serve as a valuable complement to traditional phylogenetic methods.
  • Sequence indels represent a key factor influencing the differences between pLM-based and classical phylogenetic approaches.
  • A small subset of neurons within pLMs is sufficient to approximate phylogenetic distances, indicating efficient representation of evolutionary information.