Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Protein Families02:47

Protein Families

16.6K
Protein families are groups of homologous proteins; that is, they have similarities in amino acid sequences and three-dimensional structures. Protein families usually occur because of gene duplication, where an additional copy of a gene is inserted into the genome of an organism.   Mutations that change the amino acids but still allow the protein to be properly synthesized, will lead to new protein family members.   If these new proteins contain similar amino acids in key...
16.6K
Protein Networks02:26

Protein Networks

4.5K
An organism can have thousands of different proteins, and these proteins must cooperate to ensure the health of an organism. Proteins bind to other proteins and form complexes to carry out their functions. Many proteins interact with multiple other proteins creating a complex network of protein interactions.
These interactions can be represented through maps depicting protein-protein interaction networks, represented as nodes and edges. Nodes are circles that are representative of a protein,...
4.5K
Proteomics01:33

Proteomics

9.2K
A proteome is the entire set of proteins that a cell type produces. We can study proteomes using the knowledge of genomes because genes code for mRNAs, and the mRNAs encode proteins. Although mRNA analysis is a step in the right direction, not all mRNAs are translated into proteins.
Proteomics is the study of proteomes' function. It involves the large-scale systematic study of the proteome to denote the protein complement expressed by a genome. Scientist Mark Wilkins coined the term...
9.2K
Ribosome Profiling02:24

Ribosome Profiling

4.0K
Ribosome profiling or ribo-sequencing is a deep sequencing technique that produces a snapshot of active translation in a cell. It selectively sequences the mRNAs protected by ribosomes to get an insight into a cell’s translation landscape at any given point in time.
Applications of ribosome profiling
Ribosome profiling has many applications, including in vivo monitoring of translation inside a particular organ or tissue type and quantifying new protein synthesis levels.
The technique...
4.0K
Conservation of Protein Domains Over Different Proteins02:26

Conservation of Protein Domains Over Different Proteins

14.0K
Protein domains are small structurally independent units that are part of a single amino acid chain.  Although these domains are often structurally independent, they may rely on synergistic effects to perform their functions as part of a larger protein. Protein domains may be conserved within the same organism, as well as across different organisms.
A limited set of protein domains often duplicate and recombine during evolution. These domains can be organized in different combinations to...
14.0K
Protein-protein Interfaces02:04

Protein-protein Interfaces

14.4K
Many proteins form complexes to carry out their functions, making protein-protein interactions (PPIs) essential for an organism's survival. Most PPIs are stabilized by numerous weak noncovalent chemical forces. The physical shape of the interfaces determines the way two proteins interact. Many globular proteins have closely-matching shapes on their surfaces, which form a large number of weak bonds. Additionally, many PPIs occur between two helices or between a surface cleft and a...
14.4K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Modeling patient variants of <i>Cnot1</i> and <i>Cdc42bpb</i> results in distinct forms of congenital diaphragmatic hernia in mice.

bioRxiv : the preprint server for biology·2026
Same author

Expanding the phenotypic spectrum of <i>MECOM</i>-associated syndrome: rare variants are associated with syndromic pulmonary arterial hypertension.

Journal of medical genetics·2026
Same author

Protein language models trained on biophysical dynamics inform mutation effects.

Proceedings of the National Academy of Sciences of the United States of America·2026
Same author

Antisense oligonucleotides to KIF1A polymorphisms expand targets and rescue patient-derived neurons in vitro.

Nature communications·2026
Same author

Germline and somatic variants in DNMT3A and other clonal haematopoiesis of indeterminate potential genes contribute to pulmonary arterial hypertension.

European heart journal·2025
Same author

Understanding Language Model Scaling on Protein Fitness Prediction.

bioRxiv : the preprint server for biology·2025
Same journal

Layered social competition coordinates reproductive hierarchy formation in ants.

bioRxiv : the preprint server for biology·2026
Same journal

Combination epigenetic-targeted therapy increases the immunogenicity of poorly immunogenic sarcomas.

bioRxiv : the preprint server for biology·2026
Same journal

Loss of LanC-like proteins delays post-injury regeneration of aging skeletal muscles.

bioRxiv : the preprint server for biology·2026
Same journal

Integrative Transfer Network: Deep Transfer Learning Across Populations and Prediction Targets.

bioRxiv : the preprint server for biology·2026
Same journal

Confidence-supported label-free metabolic imaging with FPhaS phase autofluorescence microscopy.

bioRxiv : the preprint server for biology·2026
Same journal

Sequence-encoded autoinhibition couples mRNA decapping activity to phase separation.

bioRxiv : the preprint server for biology·2026
See all related articles

Related Experiment Video

Updated: Jan 10, 2026

Using SCOPE to Identify Potential Regulatory Motifs in Coregulated Genes
07:55

Using SCOPE to Identify Potential Regulatory Motifs in Coregulated Genes

Published on: May 31, 2011

10.7K

MotifAE Reveals Functional Sequence Patterns from Protein Language Model: Unsupervised Discovery and Interpretability

Chao Hou1, Di Liu2, Yufeng Shen1,2,3

  • 1Department of Systems Biology, Columbia University Irving Medical Center, New York, NY 10032.

Biorxiv : the Preprint Server for Biology
|November 24, 2025
PubMed
Summary
This summary is machine-generated.

MotifAE, a novel framework, decodes protein language models (pLMs) to reveal hidden sequence patterns. This interpretable approach enhances protein function discovery and engineering.

Keywords:
fitness landscapesfunctional motif discoverymodel interpretabilityprotein language models

More Related Videos

A Protocol for Computer-Based Protein Structure and Function Prediction
16:41

A Protocol for Computer-Based Protein Structure and Function Prediction

Published on: November 3, 2011

69.7K
Peptide-based Identification of Functional Motifs and their Binding Partners
14:28

Peptide-based Identification of Functional Motifs and their Binding Partners

Published on: June 30, 2013

12.9K

Related Experiment Videos

Last Updated: Jan 10, 2026

Using SCOPE to Identify Potential Regulatory Motifs in Coregulated Genes
07:55

Using SCOPE to Identify Potential Regulatory Motifs in Coregulated Genes

Published on: May 31, 2011

10.7K
A Protocol for Computer-Based Protein Structure and Function Prediction
16:41

A Protocol for Computer-Based Protein Structure and Function Prediction

Published on: November 3, 2011

69.7K
Peptide-based Identification of Functional Motifs and their Binding Partners
14:28

Peptide-based Identification of Functional Motifs and their Binding Partners

Published on: June 30, 2013

12.9K

Area of Science:

  • Computational biology
  • Bioinformatics
  • Machine learning in protein science

Background:

  • Protein language models (pLMs) capture evolutionary sequence patterns but function as black boxes.
  • Interpreting learned patterns is crucial for understanding protein function and engineering.

Purpose of the Study:

  • To develop an unsupervised framework, MotifAE, for interpreting patterns learned by pLMs.
  • To enable the discovery and analysis of functional motifs and structural domains within protein sequences.

Main Methods:

  • Developed MotifAE, an unsupervised framework using a sparse autoencoder (SAE) architecture.
  • Incorporated a smoothness loss to improve feature coherence and motif identification.
  • Projected pLM embeddings into an interpretable, sparse latent space.

Main Results:

  • MotifAE successfully identified known functional motifs and diverse sequence patterns.
  • The framework captured structural domains, correlating feature activation with residue importance and domain function.
  • Identified features associated with domain folding stability, enabling improved stability prediction and engineering.

Conclusions:

  • MotifAE provides a general framework for systematic sequence pattern discovery and interpretation.
  • The approach advances protein function analysis, mutation effect interpretation, and rational protein engineering.
  • MotifAE facilitates the engineering of proteins with enhanced stability.