Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Prediction Intervals

Prediction Intervals

The interval estimate of any variable is known as the prediction interval. It helps decide if a point estimate is dependable.
However, the point estimate is most likely not the exact value of the population parameter, but close to it. After calculating point estimates, we construct interval estimates, called confidence intervals or prediction intervals. This prediction interval comprises a range of values unlike the point estimate and is a better predictor of the observed sample value, y.

End Point Prediction: Gran Plot

End Point Prediction: Gran Plot

A Gran plot is used to predict the equivalence volume or endpoint of a potentiometric or acid-base titration without reaching the endpoint. Typically, titration data is collected as a function of the titrant's volume up to a point less than the equivalence volume and then transformed into a linear format. The straight line is extended to the x-axis, indicating the necessary titrant volume to achieve the equivalence point.
For potentiometric titration, the Gran plot is created by plotting...

Statistical Software for Data Analysis and Clinical Trials

Statistical Software for Data Analysis and Clinical Trials

Statistical software is pivotal in data analysis and clinical trials by providing tools to analyze data, draw conclusions, and make predictions. These software packages range from simple data management applications to complex analytical platforms, supporting various statistical tests, models, and simulation techniques. Their significance lies in their ability to handle vast amounts of data with precision and efficiency, enabling researchers to validate hypotheses, identify trends, and make...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Transitional initiatives for advancing the phasing out of the use of animals for drug and chemical safety testing: The IHI VICT3R project for reducing the use of animals by implementing virtual control groups.

NAM journal·2026

Same author

Combined Modeling Approaches for Assessing Sodium-Iodide Symporter Inhibition.

Journal of chemical information and modeling·2026

Same author

Leveraging Consensus Docking Approaches for Human Mitochondrial Complexes I and III.

Chemical research in toxicology·2025

Same author

Exploring the Cell Biological and Functional Effects of the First Disease Associated KCC1 Genetic Variant.

Journal of cellular physiology·2025

Same author

Retrospective analysis of clinical laboratory parameters in Han Wistar rat controls.

Frontiers in toxicology·2025

Same author

Data-driven assessment of bioisosteric replacements and their influence on off-target activity profiles.

RSC medicinal chemistry·2025

Same journal

AmesNet: A Task-Conditioned Deep Learning Model with Enhanced Sensitivity and Generalization in Ames Mutagenicity Prediction.

Chemical research in toxicology·2026

Same journal

DNA Structure-Dependent Enrichment of Oxidative Lesions.

Chemical research in toxicology·2026

Same journal

Characterizing the Reactive Metabolites of Colony-Stimulating Factor 1 Receptor Inhibitor PLX5622 in Liver Microsomes and Mice.

Chemical research in toxicology·2026

Same journal

Quantitation of E-Cigarette Aerosol Mass in Liquid Impinger Solution Using the <sup>13</sup>C of E-Liquids: Application for Metal Analyses.

Chemical research in toxicology·2026

Same journal

Beyond Heuristics: A Model-Agnostic Framework for Uncertainty Quantification in QSAR via Adaptive Conformal Prediction.

Chemical research in toxicology·2026

Same journal

20-Hydroxyeicosatetraenoic Acid Ameliorates Nickel Nanoparticle-Induced Epithelial-Mesenchymal Transition by Modulating the FFAR1/NF-kB Pathway.

Chemical research in toxicology·2026

See all related articles

Search research articles

Related Experiment Video

Updated: May 23, 2025

Author Spotlight: Impact of Intergenic Interactions on Disease-Identifying Dark Biomarkers

Author Spotlight: Impact of Intergenic Interactions on Disease-Identifying Dark Biomarkers

Published on: March 1, 2024

Data Exploration for Target Predictions Using Proprietary and Publicly Available Data Sets.

Aljoša Smajić¹, Thomas Steger-Hartmann², Gerhard F Ecker¹

¹Department of Pharmaceutical Sciences, University of Vienna, Vienna 1090, Austria.

Chemical Research in Toxicology

|April 20, 2025

Summary

This summary is machine-generated.

Combining diverse bioactivity data for machine learning (ML) models is common, but data source differences significantly impact prediction accuracy. Models trained on one data source perform poorly on another due to chemical space variations.

More Related Videos

Performing Data Mining And Integrative Analysis Of Biomarker in Breast Cancer Using Multiple Publicly Accessible Databases

Performing Data Mining And Integrative Analysis Of Biomarker in Breast Cancer Using Multiple Publicly Accessible Databases

Published on: May 17, 2019

Author Spotlight: Streamlining Protein Target Prediction and Validation via Molecular Docking and CETSA

Author Spotlight: Streamlining Protein Target Prediction and Validation via Molecular Docking and CETSA

Published on: February 23, 2024

Related Experiment Videos

Last Updated: May 23, 2025

Author Spotlight: Impact of Intergenic Interactions on Disease-Identifying Dark Biomarkers

Author Spotlight: Impact of Intergenic Interactions on Disease-Identifying Dark Biomarkers

Published on: March 1, 2024

Performing Data Mining And Integrative Analysis Of Biomarker in Breast Cancer Using Multiple Publicly Accessible Databases

Performing Data Mining And Integrative Analysis Of Biomarker in Breast Cancer Using Multiple Publicly Accessible Databases

Published on: May 17, 2019

Author Spotlight: Streamlining Protein Target Prediction and Validation via Molecular Docking and CETSA

Author Spotlight: Streamlining Protein Target Prediction and Validation via Molecular Docking and CETSA

Published on: February 23, 2024

Area of Science:

Computational chemistry
Drug discovery
Machine learning in pharmacology

Background:

Machine learning (ML) models for bioactivity prediction often integrate data from various assay sources.
Differences in data domains and sources can lead to high variance in bioactivity values and distinct chemical space coverage.
The origin of training data significantly influences the effectiveness and applicability domain of ML prediction models.

Purpose of the Study:

To investigate the chemical space and active/inactive compound distribution of proprietary pharmaceutical data (Bayer AG) versus public data (ChEMBL).
To assess the impact of these data sources on the performance of ML classification models.
To explore strategies for creating robust mixed training datasets.

Main Methods:

Applied two descriptor sets and various ML algorithms to analyze Bayer AG and ChEMBL datasets.
Evaluated prediction performance using Matthews Correlation Coefficient (MCC) values.
Assessed chemical space overlap using mean Tanimoto similarity of nearest neighbors.
Investigated mixed training data strategies incorporating assay format and Tanimoto similarity.

Main Results:

Substantial differences in chemical space were observed between Bayer AG and ChEMBL datasets.
Models trained on one dataset showed suboptimal performance when tested on the other (MCC values between -0.34 and 0.37).
Low mean Tanimoto similarity (≤0.3) indicated limited overlap in chemical space for many targets.
Methods assessing chemical space overlap did not reliably predict model performance across datasets.

Conclusions:

Data source heterogeneity significantly impacts ML model generalizability in bioactivity prediction.
Proprietary and public datasets often represent distinct chemical spaces, limiting cross-dataset model applicability.
Developing effective strategies for integrating diverse data sources, potentially using assay information, is crucial for improving model robustness.