Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Variability: Analysis

Variability: Analysis

Measures of variability are statistical metrics that reveal the dispersion pattern within a dataset. They are pivotal in biostatistics, providing insights into the heterogeneity within health and biological data. Variability signifies the degree to which data points diverge from one another, helping researchers understand the potential range of values and associated uncertainty within the data.
The range is a simple measure of variability, indicating the difference between the highest and...

Variation

Variation

An important characteristic of any set of data is the variation in the data. In some data sets, the data values are concentrated closely near the mean; in other data sets, the data values are more widely spread out from the mean. The most common measure of variation, or spread, is the standard deviation, which is the square root of variance.
When independent and dependent variables are plotted on a scatter plot, the slope of a line is a value that describes the rate of change between the two...

Survival Tree

Survival Tree

Survival trees are a non-parametric method used in survival analysis to model the relationship between a set of covariates and the time until an event of interest occurs, often referred to as the "time-to-event" or "survival time." This method is particularly useful when dealing with censored data, where the event has not occurred for some individuals by the end of the study period, or when the exact time of the event is unknown.
Building a Survival Tree
Constructing a...

Unusual Results

Unusual Results

Unusual results are those that have a very low chance of occurring. Unusual results can be identified using probabilities and the range rule of thumb. In problems involving probability, unusual results can be observed in 2 instances – an unusually high number of successes or an unusually low number of successes.
According to the range rule of thumb, any value above or below two standard deviations, 2σ from the mean, μ is considered unusual.
Maximum unusual value =...

Quantifying and Rejecting Outliers: The Grubbs Test

Quantifying and Rejecting Outliers: The Grubbs Test

Sometimes, a data set can have a recorded numerical observation that greatly deviates from the rest of the data. Assuming that the data is normally distributed, a statistical method called the Grubbs test can be used to determine whether the observation is truly an outlier. To perform a two-tailed Grubbs test, first, calculate the absolute difference between the outlier and the mean. Then, calculate the ratio between this difference and the standard deviation of the sample. This...

Expected Frequencies in Goodness-of-Fit Tests

Expected Frequencies in Goodness-of-Fit Tests

A goodness-of-fit test is conducted to determine whether the observed frequency values are statistically similar to the frequencies expected for the dataset. Suppose the expected frequencies for a dataset are equal such as when predicting the frequency of any number appearing when casting a die. In that case, the expected frequency is the ratio of the total number of observations (n) to the number of categories (k).

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

AbDist: a lightweight, distance-based model for antibody affinity prediction as an interpretable benchmark for machine learning models.

mAbs·2026

Same author

Chemoinformatic regression methods and their applicability domain.

Molecular informatics·2024

Same author

xMaP-An Interpretable Alignment-Free Four-Dimensional Quantitative Structure-Activity Relationship Technique Based on Molecular Surface Properties and Conformer Ensembles.

Journal of chemical information and modeling·2017

Same author

Efficiency of different measures for defining the applicability domain of classification models.

Journal of cheminformatics·2017

Same journal

Correction: Chen et al. Chemical Composition of <i>Litsea pungens</i> Essential Oil and Its Potential Antioxidant and Antimicrobial Activities. <i>Molecules</i> 2023, <i>28</i>, 6835.

Molecules (Basel, Switzerland)·2026

Same journal

Correction: Ruan et al. Comparison of Extraction, Isolation, Purification, Structural Characterization and Immunomodulatory Activity of Polysaccharides from Two Species of <i>Cistanche</i>. <i>Molecules</i> 2025, <i>30</i>, 4754.

Molecules (Basel, Switzerland)·2026

Same journal

Correction: Li et al. Gastrodin Ameliorates Cognitive Dysfunction in Vascular Dementia Rats by Suppressing Ferroptosis via the Regulation of the Nrf2/Keap1-GPx4 Signaling Pathway. <i>Molecules</i> 2022, <i>27</i>, 6311.

Molecules (Basel, Switzerland)·2026

Same journal

Correction: Zueva et al. Steady-State Kinetics of Enzyme-Catalyzed Hydrolysis of Echothiophate, a P-S Bonded Organophosphorus as Monitored by Spectrofluorimetry. <i>Molecules</i> 2020, <i>25</i>, 1371.

Molecules (Basel, Switzerland)·2026

Same journal

1,4-Diazatriphenylene and Its Hetero-Fused Analogs: Synthesis and Applications.

Molecules (Basel, Switzerland)·2026

Same journal

Comparative Phytochemical Studies on the Aerial Parts of <i>Teucrium davaeanum</i> Coss. and <i>Teucrium zanonii</i> Pamp.

Molecules (Basel, Switzerland)·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Oct 13, 2025

An R-Based Landscape Validation of a Competing Risk Model

An R-Based Landscape Validation of a Competing Risk Model

Published on: September 16, 2022

Evaluating High-Variance Leaves as Uncertainty Measure for Random Forest Regression.

Thomas-Martin Dutschmann¹, Knut Baumann¹

¹Institute for Medicinal and Pharmaceutical Chemistry, University of Technology Braunschweig, Beethovenstraße 55, 38106 Braunschweig, Germany.

Molecules (Basel, Switzerland)

|November 13, 2021

Summary

This summary is machine-generated.

High-variance leaves in Random Forests offer a meaningful uncertainty measure for molecular property prediction, though not superior to standard ensemble deviation. This finding is crucial for reliable drug design models.

Keywords:

Random Forest chemoinformatics ensemble machine learning regression reliability measure uncertainty measure

More Related Videos

Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach

Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach

Published on: July 3, 2020

Measuring the Subjective Value of Risky and Ambiguous Options using Experimental Economics and Functional MRI Methods

Measuring the Subjective Value of Risky and Ambiguous Options using Experimental Economics and Functional MRI Methods

Published on: September 19, 2012

Related Experiment Videos

Last Updated: Oct 13, 2025

An R-Based Landscape Validation of a Competing Risk Model

An R-Based Landscape Validation of a Competing Risk Model

Published on: September 16, 2022

Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach

Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach

Published on: July 3, 2020

Measuring the Subjective Value of Risky and Ambiguous Options using Experimental Economics and Functional MRI Methods

Measuring the Subjective Value of Risky and Ambiguous Options using Experimental Economics and Functional MRI Methods

Published on: September 19, 2012

Area of Science:

Machine Learning
Chemoinformatics
Drug Design

Background:

Uncertainty measures are vital for assessing predictive model reliability, particularly in drug design for molecular property prediction.
Random Forests, a traditional machine learning technique in chemoinformatics, inherently provide uncertainty measures through their ensemble of decision trees.
The standard deviation of ensemble predictions is the conventional uncertainty measure for Random Forests.

Purpose of the Study:

To investigate novel uncertainty measures for Random Forests in molecular property prediction.
To evaluate the performance of high-variance leaves as an uncertainty measure compared to standard ensemble deviation.
To conduct large-scale estimations for robust claims on uncertainty measure efficacy.

Main Methods:

Utilized Random Forests for molecular property prediction tasks.
Introduced and analyzed high-variance leaves within decision trees as a novel uncertainty metric.
Performed large-scale comparisons across multiple chemoinformatic regression datasets.

Main Results:

High-variance leaf uncertainty was found to be a meaningful measure of model reliability.
This novel measure did not consistently outperform the standard ensemble deviation in predictive accuracy.
The performance of high-variance leaves varied depending on the specific dataset characteristics.

Conclusions:

While high-variance leaves offer a valid approach to uncertainty estimation in Random Forests, they do not surpass the established ensemble standard deviation.
Further research may be needed to refine or combine uncertainty measures for enhanced reliability in machine learning models for drug design.
Large-scale validation is essential for generalizing the performance of uncertainty quantification techniques.