Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Why machine learning fails at mass spectrometry for small molecules.

Nature metabolism·2026
Same author

Reply to: Limited evidence of AI superiority in seasonal influenza vaccine strain selection.

Nature medicine·2026
Same author

Protein FID: improved evaluation of protein structure generative models.

Bioinformatics (Oxford, England)·2026
Same author

Bridging the gap: aligning clinical decision support regulation with clinical practice in the era of artificial intelligence.

The Lancet. Digital health·2026
Same author

The ADAPT learning cancer treatment system: ARPA-H's initiative to revolutionize cancer therapy.

Cancer cell·2026
Same author

BoltzGen: Toward Universal Binder Design.

bioRxiv : the preprint server for biology·2025
Same journal

Correction to "AstraMEV (AI-Guided Structural Assembly of Multi-Epitope Vaccines) Against Infectious Bronchitis Virus".

Journal of chemical information and modeling·2026
Same journal

MolPy: A Large Language Model-Friendly Toolkit for Reactive Topology Editing in Polymer Simulations.

Journal of chemical information and modeling·2026
Same journal

Molecular Mechanisms of KIT Receptor Dimerization and Oncogenic Activation Revealed by Multiscale Simulations.

Journal of chemical information and modeling·2026
Same journal

Structural and Thermodynamic Discrimination between Agonists and Antagonists of Retinoic Acid Receptor γ and the Vitamin D Receptor.

Journal of chemical information and modeling·2026
Same journal

PACEff Builder: An Efficient Platform for Constructing PACE Hybrid-Resolution Models for Molecular Dynamics Simulations of Aqueous Protein, Peptide Assembly, and Membrane Protein Systems.

Journal of chemical information and modeling·2026
Same journal

TransKla: A Local-Global Cross-Attention Based Transformer Approach for Prediction of Lysine Lactylation Sites.

Journal of chemical information and modeling·2026
See all related articles

Related Experiment Video

Updated: Mar 24, 2026

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances
07:35

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

8.1K

AssayMatch: Learning To Select Data for Molecular Activity Models.

Vincent Fan1, Regina Barzilay1,2

  • 1Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States.

Journal of Chemical Information and Modeling
|March 23, 2026
PubMed
Summary
This summary is machine-generated.

AssayMatch improves machine learning for drug discovery by creating high-quality, homogeneous training datasets. This framework filters noisy experimental data, enhancing model predictive power and efficiency.

More Related Videos

Quantitative Structure-Activity Relationship, Activity Prediction, and Molecular Dynamics of Non-nucleotide Reverse Transcriptase Inhibitors
10:29

Quantitative Structure-Activity Relationship, Activity Prediction, and Molecular Dynamics of Non-nucleotide Reverse Transcriptase Inhibitors

Published on: May 9, 2025

2.6K
Author Spotlight: Advancing Alzheimer's Research – Exploring Early Detection and Multi-Omics Approaches
09:47

Author Spotlight: Advancing Alzheimer's Research – Exploring Early Detection and Multi-Omics Approaches

Published on: December 15, 2023

2.0K

Related Experiment Videos

Last Updated: Mar 24, 2026

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances
07:35

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

8.1K
Quantitative Structure-Activity Relationship, Activity Prediction, and Molecular Dynamics of Non-nucleotide Reverse Transcriptase Inhibitors
10:29

Quantitative Structure-Activity Relationship, Activity Prediction, and Molecular Dynamics of Non-nucleotide Reverse Transcriptase Inhibitors

Published on: May 9, 2025

2.6K
Author Spotlight: Advancing Alzheimer's Research – Exploring Early Detection and Multi-Omics Approaches
09:47

Author Spotlight: Advancing Alzheimer's Research – Exploring Early Detection and Multi-Omics Approaches

Published on: December 15, 2023

2.0K

Area of Science:

  • Computational chemistry
  • Machine learning in drug discovery
  • Bioinformatics

Background:

  • Machine learning model performance in drug discovery relies heavily on training data quality.
  • Aggregating bioactivity data from diverse sources like ChEMBL introduces noise due to experimental variability.
  • Existing methods struggle with selecting compatible training data for novel drug candidates.

Purpose of the Study:

  • To introduce AssayMatch, a novel framework for selecting homogeneous training datasets in drug discovery.
  • To improve the predictive accuracy and data efficiency of machine learning models by reducing experimental noise.
  • To enable data selection for test sets with unknown labels, mimicking real-world drug discovery scenarios.

Main Methods:

  • AssayMatch utilizes data attribution methods to quantify assay contributions to model performance.
  • It fine-tunes language embeddings of assay descriptions using attribution scores for semantic and compatibility matching.
  • The framework ranks training data based on fine-tuned embeddings for effective selection.

Main Results:

  • Models trained on AssayMatch-selected data outperformed models trained on complete datasets.
  • AssayMatch effectively filters out noisy or incompatible experimental data.
  • Improved prediction capabilities were observed across different machine learning architectures and model-target pairs.

Conclusions:

  • AssayMatch offers a data-driven approach to curate high-quality datasets for drug discovery.
  • The framework reduces noise from incompatible experiments, enhancing model predictive power and data efficiency.
  • AssayMatch represents a significant advancement in optimizing machine learning applications for pharmaceutical research.