Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Physiological Pharmacokinetic Models: Assumption with Protein Binding01:13

Physiological Pharmacokinetic Models: Assumption with Protein Binding

92
Physiological models with protein binding in pharmacokinetics offer a sophisticated approach to understanding drug disposition. These models consider drug-protein interactions, enabling them to effectively predict drug concentrations in different organs and tissues. This precision aids in accurate drug dosing, providing a significant advantage over conventional models. A key process within these models is equilibration, which ensures that drug concentrations achieve a steady state within the...
92
Conservation of Protein Domains Over Different Proteins02:26

Conservation of Protein Domains Over Different Proteins

11.4K
Protein domains are small structurally independent units that are part of a single amino acid chain.  Although these domains are often structurally independent, they may rely on synergistic effects to perform their functions as part of a larger protein. Protein domains may be conserved within the same organism, as well as across different organisms.
A limited set of protein domains often duplicate and recombine during evolution. These domains can be organized in different combinations to...
11.4K
Improving Translational Accuracy02:07

Improving Translational Accuracy

11.9K
Base complementarity between the three base pairs of mRNA codon and the tRNA anticodon is not a failsafe mechanism. Inaccuracies can range from a single mismatch to no correct base pairing at all. The free energy difference between the correct and nearly correct base pairs can be as small as 3 kcal/ mol. With complementarity being the only proofreading step, the estimated error frequency would be one wrong amino acid in every 100 amino acids incorporated. However, error frequencies observed in...
11.9K
Modeling and Similitude01:12

Modeling and Similitude

333
Scaled modeling is a fundamental technique in engineering, enabling the study of large and complex systems by creating smaller, manageable replicas that recreate critical characteristics of the original. In hydrology and civil infrastructure, for example, scaled models of dams help analyze water flow, turbulence, and pressure. This method allows for accurate predictions of real-world behavior within a controlled environment, significantly reducing the cost and time involved in full-scale...
333
Conserved Binding Sites01:49

Conserved Binding Sites

4.4K
Many proteins’ biological role depends on their interactions with their ligands, small molecules that bind to specific locations on the protein known as ligand-binding sites. Ligand-binding sites are often conserved among homologous proteins as these sites are critical for protein function.
Binding sites are often located in large pockets, and if their location on a protein’s surface is unknown, it can be predicted using various approaches. The energetic method computationally...
4.4K
Mechanistic Models: Compartment Models in Individual and Population Analysis01:23

Mechanistic Models: Compartment Models in Individual and Population Analysis

86
Mechanistic models are utilized in individual analysis using single-source data, but imperfections arise due to data collection errors, preventing perfect prediction of observed data. The mathematical equation involves known values (Xi), observed concentrations (Ci), measurement errors (εi), model parameters (ϕj), and the related function (ƒi) for i number of values. Different least-squares metrics quantify differences between predicted and observed values. The ordinary least...
86

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

pyVIPER: a fast and scalable Python package for protein activity estimation and master regulator analysis of single-cell RNA sequencing data.

BMC bioinformatics·2026
Same author

Modeling patient variants of <i>Cnot1</i> and <i>Cdc42bpb</i> results in distinct forms of congenital diaphragmatic hernia in mice.

bioRxiv : the preprint server for biology·2026
Same author

Non-concussive head impacts sustained during American football correlate with changes in gut microbiome diversity and composition.

PloS one·2026
Same author

Molecular dynamics simulations of intrinsically disordered protein regions enable biophysical interpretation of variant-effect predictors.

HGG advances·2026
Same author

Expanding the phenotypic spectrum of <i>MECOM</i>-associated syndrome: rare variants are associated with syndromic pulmonary arterial hypertension.

Journal of medical genetics·2026
Same author

Protein language models trained on biophysical dynamics inform mutation effects.

Proceedings of the National Academy of Sciences of the United States of America·2026
Same journal

Layered social competition coordinates reproductive hierarchy formation in ants.

bioRxiv : the preprint server for biology·2026
Same journal

Combination epigenetic-targeted therapy increases the immunogenicity of poorly immunogenic sarcomas.

bioRxiv : the preprint server for biology·2026
Same journal

Loss of LanC-like proteins delays post-injury regeneration of aging skeletal muscles.

bioRxiv : the preprint server for biology·2026
Same journal

Integrative Transfer Network: Deep Transfer Learning Across Populations and Prediction Targets.

bioRxiv : the preprint server for biology·2026
Same journal

Confidence-supported label-free metabolic imaging with FPhaS phase autofluorescence microscopy.

bioRxiv : the preprint server for biology·2026
Same journal

Sequence-encoded autoinhibition couples mRNA decapping activity to phase separation.

bioRxiv : the preprint server for biology·2026
See all related articles

Related Experiment Video

Updated: Sep 12, 2025

Protein WISDOM: A Workbench for In silico De novo Design of BioMolecules
10:58

Protein WISDOM: A Workbench for In silico De novo Design of BioMolecules

Published on: July 25, 2013

17.2K

Understanding Language Model Scaling on Protein Fitness Prediction.

Chao Hou1, Di Liu2, Aziz Zafar2

  • 1Department of Systems Biology, Columbia University Irving Medical Center, New York, NY 10032.

Biorxiv : the Preprint Server for Biology
|August 8, 2025
PubMed
Summary
This summary is machine-generated.

Protein language models' performance in fitness prediction declines with increasing size. Optimal performance requires a moderate sequence likelihood, not extreme values, challenging the "bigger is better" assumption in deep learning.

Keywords:
mutation effectprotein fitness landscapeself-supervised deep trainingsequence likelihood

More Related Videos

A Protocol for Computer-Based Protein Structure and Function Prediction
16:41

A Protocol for Computer-Based Protein Structure and Function Prediction

Published on: November 3, 2011

68.9K
A Protocol for Functional Assessment of Whole-Protein Saturation Mutagenesis Libraries Utilizing High-Throughput Sequencing
11:36

A Protocol for Functional Assessment of Whole-Protein Saturation Mutagenesis Libraries Utilizing High-Throughput Sequencing

Published on: July 3, 2016

11.0K

Related Experiment Videos

Last Updated: Sep 12, 2025

Protein WISDOM: A Workbench for In silico De novo Design of BioMolecules
10:58

Protein WISDOM: A Workbench for In silico De novo Design of BioMolecules

Published on: July 25, 2013

17.2K
A Protocol for Computer-Based Protein Structure and Function Prediction
16:41

A Protocol for Computer-Based Protein Structure and Function Prediction

Published on: November 3, 2011

68.9K
A Protocol for Functional Assessment of Whole-Protein Saturation Mutagenesis Libraries Utilizing High-Throughput Sequencing
11:36

A Protocol for Functional Assessment of Whole-Protein Saturation Mutagenesis Libraries Utilizing High-Throughput Sequencing

Published on: July 3, 2016

11.0K

Area of Science:

  • Computational Biology
  • Machine Learning
  • Protein Engineering

Background:

  • Protein language models estimate sequence likelihoods (p(sequence)) for mutation effect prediction and protein design.
  • Larger deep learning models are generally assumed to perform better across various tasks.

Purpose of the Study:

  • To investigate the scalability of protein language models for fitness prediction.
  • To understand how model size, training data, and stochasticity affect predicted p(sequence) and its relation to real protein fitness.

Main Methods:

  • Analysis of protein language model performance on fitness prediction across different model sizes and training datasets.
  • Evaluation of how predicted p(sequence) correlates with evolutionary patterns in homologous sequences.

Main Results:

  • Model performance in fitness prediction decreases beyond a certain size, contrary to general deep learning trends.
  • Model size, training data, and stochastic elements can bias predicted p(sequence) away from actual protein fitness.
  • Optimal fitness prediction occurs when p(sequence) matches evolutionary patterns at a moderate level; extreme likelihoods lead to uniform predictions, failing to capture the fitness landscape.

Conclusions:

  • Larger protein models tend to predict higher p(sequence), potentially exceeding the optimal moderate range and reducing fitness prediction accuracy.
  • Findings clarify the scaling behavior of protein models for fitness prediction and offer practical guidelines for model development and application.