Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

The Equilibrium Binding Constant and Binding Strength02:18

The Equilibrium Binding Constant and Binding Strength

14.8K
The equilibrium binding constant (Kb) quantifies the strength of a protein-ligand interaction. Kb can be calculated as follows when the reaction is at equilibrium:
14.8K
The Equilibrium Binding Constant and Binding Strength02:18

The Equilibrium Binding Constant and Binding Strength

9.9K
9.9K
Conserved Binding Sites01:49

Conserved Binding Sites

5.0K
Many proteins’ biological role depends on their interactions with their ligands, small molecules that bind to specific locations on the protein known as ligand-binding sites. Ligand-binding sites are often conserved among homologous proteins as these sites are critical for protein function.
Binding sites are often located in large pockets, and if their location on a protein’s surface is unknown, it can be predicted using various approaches. The energetic method computationally...
5.0K
Conserved Binding Sites01:49

Conserved Binding Sites

1.9K
1.9K
Ligand Binding Sites02:40

Ligand Binding Sites

14.9K
Proteins are dynamic macromolecules that carry out a wide variety of essential processes; however, the activities of most proteins depend on their interactions with other molecules or ions, known as ligands.
Protein-ligand interactions are quite specific; even though numerous potential ligands surround a cellular protein at any given time, only a particular ligand can bind to that protein. Moreover, a ligand binds only to a dedicated area on the surface of the protein, known as the...
14.9K
Ligand Binding and Linkage00:49

Ligand Binding and Linkage

5.5K
Allosteric proteins have more than one ligand binding site; the binding of a ligand to any of these sites influences the binding of ligands to the other sites. When a protein is allosteric, its binding sites are called coupled or linked.  In the case of enzymes, the site that binds to the substrate is known as the active site and the other site is known as the regulatory site. When a ligand binds to the regulatory site, this leads to conformational changes in the protein that can influence...
5.5K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Biomass-Derived Diformylxylose as a Renewable Solvent for Biocatalysis Applications.

ChemSusChem·2026
Same author

Toward the Chemoenzymatic Synthesis of DNA-Encoded Libraries.

ACS central science·2026
Same author

Of Revolutions and Roadblocks: The Emerging Role of Machine Learning in Biocatalysis.

ACS central science·2025
Same author

Early onset of septal FtsK localization allows for efficient DNA segregation in SMC-deleted <i>Corynebacterium glutamicum</i> strains.

mBio·2025
Same author

Structure Prediction and Computational Protein Design for Efficient Biocatalysts and Bioactive Proteins.

Angewandte Chemie (International ed. in English)·2024
Same author

Propionic acid supplementation promotes the expansion of regulatory T cells in patients with end-stage renal disease but not in renal transplant patients.

Frontiers in transplantation·2024
Same journal

Algorithm-hardware co-design of neuromorphic networks with dual memory pathways.

Nature machine intelligence·2026
Same journal

Plagiarism in the Age of Generative Artificial Intelligence: The advent of generative artificial intelligence (GenAI) tools is challenging the scientific community's understanding of the meaning and significance of plagiarism. A new definition of research misconduct is needed that specifically addresses the use of GenAI writing tools.

Nature machine intelligence·2026
Same journal

Platonic representation of foundation machine learning interatomic potentials.

Nature machine intelligence·2026
Same journal

Immunotherapy drug target identification using machine learning and patient-derived tumour explant validation.

Nature machine intelligence·2026
Same journal

A generative artificial intelligence approach for peptide antibiotic optimization.

Nature machine intelligence·2026
Same journal

LLMs displaying less cognitive bias are not necessarily better decision makers.

Nature machine intelligence·2026
See all related articles

Related Experiment Video

Updated: Jan 14, 2026

Author Spotlight: A Computational Approach to Decipher Amino Acid Preferences in Multispecific Protein-Protein Interactions
06:50

Author Spotlight: A Computational Approach to Decipher Amino Acid Preferences in Multispecific Protein-Protein Interactions

Published on: January 26, 2024

2.5K

Resolving data bias improves generalization in binding affinity prediction.

David Graber1,2,3, Peter Stockinger2,4, Fabian Meyer2

  • 1Seminar for Applied Mathematics, Department of Mathematics and ETH AI Center, Zurich, Switzerland.

Nature Machine Intelligence
|October 27, 2025
PubMed
Summary
This summary is machine-generated.

Data leakage in protein-ligand binding affinity prediction inflated model performance. Our PDBbind CleanSplit dataset and graph neural network model reveal true generalization capabilities, addressing critical issues in computational drug design.

Keywords:
CheminformaticsDrug discoveryMachine learningScientific data

More Related Videos

Fluorescence Anisotropy as a Tool to Study Protein-protein Interactions
10:44

Fluorescence Anisotropy as a Tool to Study Protein-protein Interactions

Published on: October 21, 2016

31.5K
Protein WISDOM: A Workbench for In silico De novo Design of BioMolecules
10:58

Protein WISDOM: A Workbench for In silico De novo Design of BioMolecules

Published on: July 25, 2013

17.6K

Related Experiment Videos

Last Updated: Jan 14, 2026

Author Spotlight: A Computational Approach to Decipher Amino Acid Preferences in Multispecific Protein-Protein Interactions
06:50

Author Spotlight: A Computational Approach to Decipher Amino Acid Preferences in Multispecific Protein-Protein Interactions

Published on: January 26, 2024

2.5K
Fluorescence Anisotropy as a Tool to Study Protein-protein Interactions
10:44

Fluorescence Anisotropy as a Tool to Study Protein-protein Interactions

Published on: October 21, 2016

31.5K
Protein WISDOM: A Workbench for In silico De novo Design of BioMolecules
10:58

Protein WISDOM: A Workbench for In silico De novo Design of BioMolecules

Published on: July 25, 2013

17.6K

Area of Science:

  • Computational chemistry
  • Structural biology
  • Machine learning

Background:

  • Accurate prediction of protein-ligand binding affinities is crucial for computational drug design.
  • Existing deep learning models often show inflated performance due to train-test data leakage from the PDBbind database and benchmark datasets.
  • This leakage overestimates the generalization capabilities of current binding affinity prediction models.

Purpose of the Study:

  • To address the issue of train-test data leakage in binding affinity prediction datasets.
  • To develop a reliable benchmark dataset and a robust model for evaluating generalization capabilities.
  • To identify the true performance of deep learning models in computational drug design.

Main Methods:

  • Developed PDBbind CleanSplit, a curated training dataset using a novel structure-based filtering algorithm to eliminate data leakage and redundancy.
  • Retrained existing top-performing deep learning models on the CleanSplit dataset.
  • Developed a novel graph neural network model utilizing sparse graph modeling of protein-ligand interactions and transfer learning from language models.

Main Results:

  • Retraining existing models on CleanSplit resulted in a substantial drop in performance, confirming the significant impact of data leakage.
  • The proposed graph neural network model maintained high performance on the CleanSplit benchmark.
  • The graph neural network model demonstrated strong generalization capabilities on strictly independent test datasets.

Conclusions:

  • The performance of many current deep learning models for binding affinity prediction is largely overestimated due to data leakage.
  • The PDBbind CleanSplit dataset provides a more realistic evaluation of model generalization.
  • The developed graph neural network model offers a promising approach for accurate and generalizable binding affinity prediction in drug design.