Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Confidence Coefficient01:24

Confidence Coefficient

7.5K
The confidence coefficient is also known as the confidence level or degree of confidence. It is the percent expression for the probability, 1-α, that the confidence interval contains the true population parameter assuming that the confidence interval is obtained after sufficient unbiased sampling; for example, if the CL = 90%, then in 90 out of 100 samples the interval estimate will enclose the true population parameter. Here α is the area under the curve, distributed equally under...
7.5K
Prediction Intervals01:03

Prediction Intervals

2.2K
The interval estimate of any variable is known as the prediction interval. It helps decide if a point estimate is dependable.
However, the point estimate is most likely not the exact value of the population parameter, but close to it. After calculating point estimates, we construct interval estimates, called confidence intervals or prediction intervals. This prediction interval comprises a range of values unlike the point estimate and is a better predictor of the observed sample value, y. 
2.2K
Uncertainty: Confidence Intervals00:54

Uncertainty: Confidence Intervals

3.1K
The confidence interval is the range of values around the mean that contains the true mean. It is expressed as a probability percentage. The interpretation of a 95% confidence interval, for instance, is that the statistician is 95% confident that the true mean falls within the interval. The upper and lower limits of this range are known as confidence limits. The confidence limits for the true mean are estimated from the sample's mean, the standard deviation, and the statistical factor...
3.1K
Interpretation of Confidence Intervals01:19

Interpretation of Confidence Intervals

5.6K
A confidence interval is a better estimate of the population than a point estimate, as it uses a range of values from a sample instead of a single value.
Confidence intervals have confidence coefficients that are crucial for their interpretation. The most common confidence coefficients are 0.90, 0.95, and 0.99, which can be written as percentages–90%, 95%, and 99%, respectively.
Suppose a person calculates a confidence interval with a confidence coefficient of 0.95. In that case, they can...
5.6K
Confidence Intervals01:21

Confidence Intervals

6.1K
An unbiased point estimate is often insufficient to predict a population estimate, such as population mean or population proportion. In this scenario, a confidence interval is used. A confidence interval is an estimate similar to a  sample proportion. However, unlike the point estimate which is a single value, the confidence interval  contains a range of values. These values have lower and upper limits, known as confidence limits, and can be designated as L1 and L2, respectively.
A...
6.1K
Propagation of Uncertainty from Systematic Error01:10

Propagation of Uncertainty from Systematic Error

475
The atomic mass of an element varies due to the relative ratio of its isotopes. A sample's relative proportion of oxygen isotopes influences its average atomic mass. For instance, if we were to measure the atomic mass of oxygen from a sample, the mass would be a weighted average of the isotopic masses of oxygen in that sample. Since a single sample is not likely to perfectly reflect the true atomic mass of oxygen for all the molecules of oxygen on Earth, the mass we obtain from this...
475

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Explainable artificial intelligence reveals divergent learning in pharmacophore-based hierarchical pooling graph neural networks.

Scientific reports·2026
Same author

From Prediction to Insight: Visual Analytics for Understanding Compound Potency Models.

IEEE computer graphics and applications·2026
Same author

Categorization of Protein Kinases by Combining Data from Cell Biology and Medicinal Chemistry Enables Further Evaluation and Differentiation of the Understudied Kinome.

Journal of medicinal chemistry·2026
Same author

Explainable artificial intelligence for molecular design in pharmaceutical research.

Chemical science·2026
Same author

Transformer Learning in Sequence-Based Drug Design Depends on Compound Memorization and Similarity of Sequence-Compound Pairs.

Molecular informatics·2026
Same author

Identifying and evaluating understudied protein kinases using biological and chemical criteria.

RSC medicinal chemistry·2025
Same journal

Correction to "AstraMEV (AI-Guided Structural Assembly of Multi-Epitope Vaccines) Against Infectious Bronchitis Virus".

Journal of chemical information and modeling·2026
Same journal

MolPy: A Large Language Model-Friendly Toolkit for Reactive Topology Editing in Polymer Simulations.

Journal of chemical information and modeling·2026
Same journal

Molecular Mechanisms of KIT Receptor Dimerization and Oncogenic Activation Revealed by Multiscale Simulations.

Journal of chemical information and modeling·2026
Same journal

Structural and Thermodynamic Discrimination between Agonists and Antagonists of Retinoic Acid Receptor γ and the Vitamin D Receptor.

Journal of chemical information and modeling·2026
Same journal

PACEff Builder: An Efficient Platform for Constructing PACE Hybrid-Resolution Models for Molecular Dynamics Simulations of Aqueous Protein, Peptide Assembly, and Membrane Protein Systems.

Journal of chemical information and modeling·2026
Same journal

TransKla: A Local-Global Cross-Attention Based Transformer Approach for Prediction of Lysine Lactylation Sites.

Journal of chemical information and modeling·2026
See all related articles

Related Experiment Video

Updated: Jun 5, 2025

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment
12:18

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

7.5K

Influence of Data Curation and Confidence Levels on Compound Predictions Using Machine Learning Models.

Elena Xerxa1,2, Martin Vogt1,2, Jürgen Bajorath1,2,3

  • 1B-IT, Department of Life Science Informatics and Data Science, Rheinische Friedrich-Wilhelms-Universität, Friedrich-Hirzebruch-Allee 5/6, Bonn D-53115, Germany.

Journal of Chemical Information and Modeling
|December 10, 2024
PubMed
Summary
This summary is machine-generated.

Data curation significantly improves machine learning (ML) model performance in chemistry. Applying sequential curation to chemical data incrementally enhances classification accuracy by refining data quality and chemical space separation.

More Related Videos

Machine Learning Algorithms for Early Detection of Bone Metastases in an Experimental Rat Model
07:15

Machine Learning Algorithms for Early Detection of Bone Metastases in an Experimental Rat Model

Published on: August 16, 2020

6.7K
An R-Based Landscape Validation of a Competing Risk Model
05:37

An R-Based Landscape Validation of a Competing Risk Model

Published on: September 16, 2022

2.0K

Related Experiment Videos

Last Updated: Jun 5, 2025

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment
12:18

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

7.5K
Machine Learning Algorithms for Early Detection of Bone Metastases in an Experimental Rat Model
07:15

Machine Learning Algorithms for Early Detection of Bone Metastases in an Experimental Rat Model

Published on: August 16, 2020

6.7K
An R-Based Landscape Validation of a Competing Risk Model
05:37

An R-Based Landscape Validation of a Competing Risk Model

Published on: September 16, 2022

2.0K

Area of Science:

  • Chemistry
  • Data Science
  • Machine Learning

Background:

  • Data curation is crucial in data science but often overlooked in chemical machine learning.
  • Evaluating the impact of data curation on molecular machine learning (ML) models is essential.

Purpose of the Study:

  • To assess the effects of data curation on the performance of molecular ML models.
  • To develop and evaluate a sequential curation scheme for compounds and activity data.

Main Methods:

  • A sequential curation scheme was developed for chemical compounds and activity data.
  • Machine learning classification models were generated at increasing data confidence levels.
  • Model performance was evaluated across different data curation levels.

Main Results:

  • Systematic and incremental increases in classification performance were observed with sequential data curation.
  • Data curation enhanced the separation of compounds with different class labels in chemical space.
  • Elimination of singletons, rather than analogue series, primarily drove improved chemical space separation.

Conclusions:

  • Stringent data curation directly leads to enhanced performance of ML models in chemical applications.
  • Varying data curation and confidence levels should be carefully considered when developing and evaluating chemical ML models.