Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Confidence Coefficient

Confidence Coefficient

The confidence coefficient is also known as the confidence level or degree of confidence. It is the percent expression for the probability, 1-α, that the confidence interval contains the true population parameter assuming that the confidence interval is obtained after sufficient unbiased sampling; for example, if the CL = 90%, then in 90 out of 100 samples the interval estimate will enclose the true population parameter. Here α is the area under the curve, distributed equally under...

Prediction Intervals

Prediction Intervals

The interval estimate of any variable is known as the prediction interval. It helps decide if a point estimate is dependable.
However, the point estimate is most likely not the exact value of the population parameter, but close to it. After calculating point estimates, we construct interval estimates, called confidence intervals or prediction intervals. This prediction interval comprises a range of values unlike the point estimate and is a better predictor of the observed sample value, y.

Uncertainty: Confidence Intervals

Uncertainty: Confidence Intervals

The confidence interval is the range of values around the mean that contains the true mean. It is expressed as a probability percentage. The interpretation of a 95% confidence interval, for instance, is that the statistician is 95% confident that the true mean falls within the interval. The upper and lower limits of this range are known as confidence limits. The confidence limits for the true mean are estimated from the sample's mean, the standard deviation, and the statistical factor...

Interpretation of Confidence Intervals

Interpretation of Confidence Intervals

A confidence interval is a better estimate of the population than a point estimate, as it uses a range of values from a sample instead of a single value.
Confidence intervals have confidence coefficients that are crucial for their interpretation. The most common confidence coefficients are 0.90, 0.95, and 0.99, which can be written as percentages–90%, 95%, and 99%, respectively.
Suppose a person calculates a confidence interval with a confidence coefficient of 0.95. In that case, they can...

Confidence Intervals

Confidence Intervals

An unbiased point estimate is often insufficient to predict a population estimate, such as population mean or population proportion. In this scenario, a confidence interval is used. A confidence interval is an estimate similar to a sample proportion. However, unlike the point estimate which is a single value, the confidence interval contains a range of values. These values have lower and upper limits, known as confidence limits, and can be designated as L1 and L2, respectively.
A...

Propagation of Uncertainty from Systematic Error

Propagation of Uncertainty from Systematic Error

The atomic mass of an element varies due to the relative ratio of its isotopes. A sample's relative proportion of oxygen isotopes influences its average atomic mass. For instance, if we were to measure the atomic mass of oxygen from a sample, the mass would be a weighted average of the isotopic masses of oxygen in that sample. Since a single sample is not likely to perfectly reflect the true atomic mass of oxygen for all the molecules of oxygen on Earth, the mass we obtain from this...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Explainable artificial intelligence reveals divergent learning in pharmacophore-based hierarchical pooling graph neural networks.

Scientific reports·2026

Same author

From Prediction to Insight: Visual Analytics for Understanding Compound Potency Models.

IEEE computer graphics and applications·2026

Same author

Categorization of Protein Kinases by Combining Data from Cell Biology and Medicinal Chemistry Enables Further Evaluation and Differentiation of the Understudied Kinome.

Journal of medicinal chemistry·2026

Same author

Explainable artificial intelligence for molecular design in pharmaceutical research.

Chemical science·2026

Same author

Transformer Learning in Sequence-Based Drug Design Depends on Compound Memorization and Similarity of Sequence-Compound Pairs.

Molecular informatics·2026

Same author

Identifying and evaluating understudied protein kinases using biological and chemical criteria.

RSC medicinal chemistry·2025

Same journal

Correction to "AstraMEV (AI-Guided Structural Assembly of Multi-Epitope Vaccines) Against Infectious Bronchitis Virus".

Journal of chemical information and modeling·2026

Same journal

MolPy: A Large Language Model-Friendly Toolkit for Reactive Topology Editing in Polymer Simulations.

Journal of chemical information and modeling·2026

Same journal

Molecular Mechanisms of KIT Receptor Dimerization and Oncogenic Activation Revealed by Multiscale Simulations.

Journal of chemical information and modeling·2026

Same journal

Structural and Thermodynamic Discrimination between Agonists and Antagonists of Retinoic Acid Receptor γ and the Vitamin D Receptor.

Journal of chemical information and modeling·2026

Same journal

PACEff Builder: An Efficient Platform for Constructing PACE Hybrid-Resolution Models for Molecular Dynamics Simulations of Aqueous Protein, Peptide Assembly, and Membrane Protein Systems.

Journal of chemical information and modeling·2026

Same journal

TransKla: A Local-Global Cross-Attention Based Transformer Approach for Prediction of Lysine Lactylation Sites.

Journal of chemical information and modeling·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Jun 5, 2025

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

Influence of Data Curation and Confidence Levels on Compound Predictions Using Machine Learning Models.

Elena Xerxa^1,2, Martin Vogt^1,2, Jürgen Bajorath^1,2,3

¹B-IT, Department of Life Science Informatics and Data Science, Rheinische Friedrich-Wilhelms-Universität, Friedrich-Hirzebruch-Allee 5/6, Bonn D-53115, Germany.

Journal of Chemical Information and Modeling

|December 10, 2024

Summary

This summary is machine-generated.

Data curation significantly improves machine learning (ML) model performance in chemistry. Applying sequential curation to chemical data incrementally enhances classification accuracy by refining data quality and chemical space separation.

More Related Videos

Machine Learning Algorithms for Early Detection of Bone Metastases in an Experimental Rat Model

Machine Learning Algorithms for Early Detection of Bone Metastases in an Experimental Rat Model

Published on: August 16, 2020

An R-Based Landscape Validation of a Competing Risk Model

An R-Based Landscape Validation of a Competing Risk Model

Published on: September 16, 2022

Related Experiment Videos

Last Updated: Jun 5, 2025

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

Machine Learning Algorithms for Early Detection of Bone Metastases in an Experimental Rat Model

Machine Learning Algorithms for Early Detection of Bone Metastases in an Experimental Rat Model

Published on: August 16, 2020

An R-Based Landscape Validation of a Competing Risk Model

An R-Based Landscape Validation of a Competing Risk Model

Published on: September 16, 2022

Area of Science:

Chemistry
Data Science
Machine Learning

Background:

Data curation is crucial in data science but often overlooked in chemical machine learning.
Evaluating the impact of data curation on molecular machine learning (ML) models is essential.

Purpose of the Study:

To assess the effects of data curation on the performance of molecular ML models.
To develop and evaluate a sequential curation scheme for compounds and activity data.

Main Methods:

A sequential curation scheme was developed for chemical compounds and activity data.
Machine learning classification models were generated at increasing data confidence levels.
Model performance was evaluated across different data curation levels.

Main Results:

Systematic and incremental increases in classification performance were observed with sequential data curation.
Data curation enhanced the separation of compounds with different class labels in chemical space.
Elimination of singletons, rather than analogue series, primarily drove improved chemical space separation.

Conclusions:

Stringent data curation directly leads to enhanced performance of ML models in chemical applications.
Varying data curation and confidence levels should be carefully considered when developing and evaluating chemical ML models.