Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Protein-protein Interfaces02:04

Protein-protein Interfaces

14.4K
Many proteins form complexes to carry out their functions, making protein-protein interactions (PPIs) essential for an organism's survival. Most PPIs are stabilized by numerous weak noncovalent chemical forces. The physical shape of the interfaces determines the way two proteins interact. Many globular proteins have closely-matching shapes on their surfaces, which form a large number of weak bonds. Additionally, many PPIs occur between two helices or between a surface cleft and a...
14.4K
Protein Networks02:26

Protein Networks

4.5K
An organism can have thousands of different proteins, and these proteins must cooperate to ensure the health of an organism. Proteins bind to other proteins and form complexes to carry out their functions. Many proteins interact with multiple other proteins creating a complex network of protein interactions.
These interactions can be represented through maps depicting protein-protein interaction networks, represented as nodes and edges. Nodes are circles that are representative of a protein,...
4.5K
Conserved Binding Sites01:49

Conserved Binding Sites

5.0K
Many proteins’ biological role depends on their interactions with their ligands, small molecules that bind to specific locations on the protein known as ligand-binding sites. Ligand-binding sites are often conserved among homologous proteins as these sites are critical for protein function.
Binding sites are often located in large pockets, and if their location on a protein’s surface is unknown, it can be predicted using various approaches. The energetic method computationally...
5.0K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

OncoTrace-TOO: Interpretable Machine Learning Framework for Cancer Tissue-of-Origin Identification Using Transcriptomic Signatures.

Cancer reports (Hoboken, N.J.)·2025
Same author

Identification of the gene cluster for the dithiolopyrrolone antibiotic holomycin in Streptomyces clavuligerus.

Proceedings of the National Academy of Sciences of the United States of America·2010
Same author

Safety evaluation of tea (Camellia sinensis (L.) O. Kuntze) flower extract: assessment of mutagenicity, and acute and subchronic toxicity in rats.

Journal of ethnopharmacology·2010
Same author

Influences of soil properties and leaching on nickel toxicity to barley root elongation.

Ecotoxicology and environmental safety·2010
Same author

Effects of CO2 insufflation on cerebrum during endoscopic thyroidectomy in a porcine model.

Surgical endoscopy·2010
Same author

Plants' use of different nitrogen forms in response to crude oil contamination.

Environmental pollution (Barking, Essex : 1987)·2010
Same journal

QSAR in the Browser: An Interactive Cheminformatics Web Application.

Journal of chemical information and modeling·2026
Same journal

FoldDoF: Utilizing the Primary Degrees of Freedom of Protein Backbone for Geometric Modeling and Generation.

Journal of chemical information and modeling·2026
Same journal

Derisking Affinity Optimization for Macrocycles and Cyclic Peptides: High-Precision Free Energy Simulations across Five Diverse Targets.

Journal of chemical information and modeling·2026
Same journal

An End-User Audit of Reproducibility, Data Leakage, and Overfitting of the Top-Ranked ADMET Prediction Models in TDC Leaderboards.

Journal of chemical information and modeling·2026
Same journal

PFASGroups: An Open-Source Framework for Automated Identification, Structural Classification, and Prioritization of Per- and Polyfluoroalkyl Substances.

Journal of chemical information and modeling·2026
Same journal

DeepKbhb: Context-Aware Prediction of Human Lysine β-Hydroxybutyrylation Sites.

Journal of chemical information and modeling·2026
See all related articles

Related Experiment Video

Updated: Jan 10, 2026

Author Spotlight: A Computational Approach to Decipher Amino Acid Preferences in Multispecific Protein-Protein Interactions
06:50

Author Spotlight: A Computational Approach to Decipher Amino Acid Preferences in Multispecific Protein-Protein Interactions

Published on: January 26, 2024

2.5K

Benchmarking Sequence-Based Compound-Protein Interaction Prediction through Constructing a Debiased Data Set CDPN.

Yang Hao1,2,3,4, Bo Li2,3, Daiyun Huang2,5

  • 1Department of Hepatobiliary Surgery, Haikou Affiliated Hospital of Central South University Xiangya School of Medicine, Haikou 570208, P.R. China.

Journal of Chemical Information and Modeling
|November 20, 2025
PubMed
Summary
This summary is machine-generated.

Accurate compound-protein interaction (CPI) prediction is vital for drug discovery. A new method, CDPN, debiases datasets to improve machine learning model generalization and virtual screening performance.

More Related Videos

Mapping Dysfunctional Protein-Protein Interactions in Disease
09:39

Mapping Dysfunctional Protein-Protein Interactions in Disease

Published on: October 24, 2025

532
Informatic Analysis of Sequence Data from Batch Yeast 2-Hybrid Screens
09:14

Informatic Analysis of Sequence Data from Batch Yeast 2-Hybrid Screens

Published on: June 28, 2018

7.5K

Related Experiment Videos

Last Updated: Jan 10, 2026

Author Spotlight: A Computational Approach to Decipher Amino Acid Preferences in Multispecific Protein-Protein Interactions
06:50

Author Spotlight: A Computational Approach to Decipher Amino Acid Preferences in Multispecific Protein-Protein Interactions

Published on: January 26, 2024

2.5K
Mapping Dysfunctional Protein-Protein Interactions in Disease
09:39

Mapping Dysfunctional Protein-Protein Interactions in Disease

Published on: October 24, 2025

532
Informatic Analysis of Sequence Data from Batch Yeast 2-Hybrid Screens
09:14

Informatic Analysis of Sequence Data from Batch Yeast 2-Hybrid Screens

Published on: June 28, 2018

7.5K

Area of Science:

  • Computational chemistry
  • Drug discovery
  • Machine learning

Background:

  • Accurate prediction of compound-protein interactions (CPIs) is crucial for accelerating drug discovery.
  • Existing datasets often contain biases, such as over-represented molecular scaffolds and imbalanced label distributions, which can lead to machine learning shortcuts and hinder model generalization.
  • Current debiasing methods may compromise dataset diversity.

Purpose of the Study:

  • To address biases in compound-protein interaction (CPI) datasets.
  • To introduce a novel protocol, Clustering-based Down-sampling and Putative Negatives (CDPN), for constructing a debiased CPI benchmark.
  • To systematically benchmark deep learning-based CPI models, particularly protein language models, using the CDPN dataset.

Main Methods:

  • Developed the Clustering-based Down-sampling and Putative Negatives (CDPN) protocol.
  • CDPN mitigates biases via compound cluster-level down-sampling.
  • Generated putative negatives from unexplored chemical spaces to ensure balanced label distributions.

Main Results:

  • Systematically benchmarked deep learning CPI models on the CDPN dataset.
  • Identified limitations in attention interpretability for protein language models on PDBbind.
  • Discovered KPGT-Ankh as a superior model with enhanced generalization and virtual screening performance through ablation studies on the CDPN dataset.

Conclusions:

  • The CDPN protocol effectively creates a debiased CPI benchmark.
  • KPGT-Ankh demonstrates superior performance for CPI prediction.
  • Top-performing models were integrated into DeepSEQreen, a no-code web server, to enhance accessibility and facilitate community feedback for drug discovery research.