Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Prediction Intervals01:03

Prediction Intervals

2.2K
The interval estimate of any variable is known as the prediction interval. It helps decide if a point estimate is dependable.
However, the point estimate is most likely not the exact value of the population parameter, but close to it. After calculating point estimates, we construct interval estimates, called confidence intervals or prediction intervals. This prediction interval comprises a range of values unlike the point estimate and is a better predictor of the observed sample value, y. 
2.2K
End Point Prediction: Gran Plot01:07

End Point Prediction: Gran Plot

229
A Gran plot is used to predict the equivalence volume or endpoint of a potentiometric or acid-base titration without reaching the endpoint. Typically, titration data is collected as a function of the titrant's volume up to a point less than the equivalence volume and then transformed into a linear format. The straight line is extended to the x-axis, indicating the necessary titrant volume to achieve the equivalence point.
For potentiometric titration, the Gran plot is created by plotting...
229
Statistical Software for Data Analysis and Clinical Trials01:12

Statistical Software for Data Analysis and Clinical Trials

477
Statistical software is pivotal in data analysis and clinical trials by providing tools to analyze data, draw conclusions, and make predictions. These software packages range from simple data management applications to complex analytical platforms, supporting various statistical tests, models, and simulation techniques. Their significance lies in their ability to handle vast amounts of data with precision and efficiency, enabling researchers to validate hypotheses, identify trends, and make...
477

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Transitional initiatives for advancing the phasing out of the use of animals for drug and chemical safety testing: The IHI VICT3R project for reducing the use of animals by implementing virtual control groups.

NAM journal·2026
Same author

Combined Modeling Approaches for Assessing Sodium-Iodide Symporter Inhibition.

Journal of chemical information and modeling·2026
Same author

Leveraging Consensus Docking Approaches for Human Mitochondrial Complexes I and III.

Chemical research in toxicology·2025
Same author

Exploring the Cell Biological and Functional Effects of the First Disease Associated KCC1 Genetic Variant.

Journal of cellular physiology·2025
Same author

Retrospective analysis of clinical laboratory parameters in Han Wistar rat controls.

Frontiers in toxicology·2025
Same author

Data-driven assessment of bioisosteric replacements and their influence on off-target activity profiles.

RSC medicinal chemistry·2025
Same journal

AmesNet: A Task-Conditioned Deep Learning Model with Enhanced Sensitivity and Generalization in Ames Mutagenicity Prediction.

Chemical research in toxicology·2026
Same journal

DNA Structure-Dependent Enrichment of Oxidative Lesions.

Chemical research in toxicology·2026
Same journal

Characterizing the Reactive Metabolites of Colony-Stimulating Factor 1 Receptor Inhibitor PLX5622 in Liver Microsomes and Mice.

Chemical research in toxicology·2026
Same journal

Quantitation of E-Cigarette Aerosol Mass in Liquid Impinger Solution Using the <sup>13</sup>C of E-Liquids: Application for Metal Analyses.

Chemical research in toxicology·2026
Same journal

Beyond Heuristics: A Model-Agnostic Framework for Uncertainty Quantification in QSAR via Adaptive Conformal Prediction.

Chemical research in toxicology·2026
Same journal

20-Hydroxyeicosatetraenoic Acid Ameliorates Nickel Nanoparticle-Induced Epithelial-Mesenchymal Transition by Modulating the FFAR1/NF-kB Pathway.

Chemical research in toxicology·2026
See all related articles

Related Experiment Video

Updated: May 23, 2025

Author Spotlight: Impact of Intergenic Interactions on Disease-Identifying Dark Biomarkers
03:37

Author Spotlight: Impact of Intergenic Interactions on Disease-Identifying Dark Biomarkers

Published on: March 1, 2024

616

Data Exploration for Target Predictions Using Proprietary and Publicly Available Data Sets.

Aljoša Smajić1, Thomas Steger-Hartmann2, Gerhard F Ecker1

  • 1Department of Pharmaceutical Sciences, University of Vienna, Vienna 1090, Austria.

Chemical Research in Toxicology
|April 20, 2025
PubMed
Summary
This summary is machine-generated.

Combining diverse bioactivity data for machine learning (ML) models is common, but data source differences significantly impact prediction accuracy. Models trained on one data source perform poorly on another due to chemical space variations.

More Related Videos

Performing Data Mining And Integrative Analysis Of Biomarker in Breast Cancer Using Multiple Publicly Accessible Databases
07:41

Performing Data Mining And Integrative Analysis Of Biomarker in Breast Cancer Using Multiple Publicly Accessible Databases

Published on: May 17, 2019

8.8K
Author Spotlight: Streamlining Protein Target Prediction and Validation via Molecular Docking and CETSA
10:21

Author Spotlight: Streamlining Protein Target Prediction and Validation via Molecular Docking and CETSA

Published on: February 23, 2024

2.3K

Related Experiment Videos

Last Updated: May 23, 2025

Author Spotlight: Impact of Intergenic Interactions on Disease-Identifying Dark Biomarkers
03:37

Author Spotlight: Impact of Intergenic Interactions on Disease-Identifying Dark Biomarkers

Published on: March 1, 2024

616
Performing Data Mining And Integrative Analysis Of Biomarker in Breast Cancer Using Multiple Publicly Accessible Databases
07:41

Performing Data Mining And Integrative Analysis Of Biomarker in Breast Cancer Using Multiple Publicly Accessible Databases

Published on: May 17, 2019

8.8K
Author Spotlight: Streamlining Protein Target Prediction and Validation via Molecular Docking and CETSA
10:21

Author Spotlight: Streamlining Protein Target Prediction and Validation via Molecular Docking and CETSA

Published on: February 23, 2024

2.3K

Area of Science:

  • Computational chemistry
  • Drug discovery
  • Machine learning in pharmacology

Background:

  • Machine learning (ML) models for bioactivity prediction often integrate data from various assay sources.
  • Differences in data domains and sources can lead to high variance in bioactivity values and distinct chemical space coverage.
  • The origin of training data significantly influences the effectiveness and applicability domain of ML prediction models.

Purpose of the Study:

  • To investigate the chemical space and active/inactive compound distribution of proprietary pharmaceutical data (Bayer AG) versus public data (ChEMBL).
  • To assess the impact of these data sources on the performance of ML classification models.
  • To explore strategies for creating robust mixed training datasets.

Main Methods:

  • Applied two descriptor sets and various ML algorithms to analyze Bayer AG and ChEMBL datasets.
  • Evaluated prediction performance using Matthews Correlation Coefficient (MCC) values.
  • Assessed chemical space overlap using mean Tanimoto similarity of nearest neighbors.
  • Investigated mixed training data strategies incorporating assay format and Tanimoto similarity.

Main Results:

  • Substantial differences in chemical space were observed between Bayer AG and ChEMBL datasets.
  • Models trained on one dataset showed suboptimal performance when tested on the other (MCC values between -0.34 and 0.37).
  • Low mean Tanimoto similarity (≤0.3) indicated limited overlap in chemical space for many targets.
  • Methods assessing chemical space overlap did not reliably predict model performance across datasets.

Conclusions:

  • Data source heterogeneity significantly impacts ML model generalizability in bioactivity prediction.
  • Proprietary and public datasets often represent distinct chemical spaces, limiting cross-dataset model applicability.
  • Developing effective strategies for integrating diverse data sources, potentially using assay information, is crucial for improving model robustness.