Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Variability: Analysis01:11

Variability: Analysis

232
Measures of variability are statistical metrics that reveal the dispersion pattern within a dataset. They are pivotal in biostatistics, providing insights into the heterogeneity within health and biological data. Variability signifies the degree to which data points diverge from one another, helping researchers understand the potential range of values and associated uncertainty within the data.
The range is a simple measure of variability, indicating the difference between the highest and...
232
Variation01:19

Variation

7.3K
An important characteristic of any set of data is the variation in the data. In some data sets, the data values are concentrated closely near the mean; in other data sets, the data values are more widely spread out from the mean. The most common measure of variation, or spread, is the standard deviation, which is the square root of variance.
When independent and dependent variables are plotted on a scatter plot, the slope of a line is a value that describes the rate of change between the two...
7.3K
Survival Tree01:19

Survival Tree

181
Survival trees are a non-parametric method used in survival analysis to model the relationship between a set of covariates and the time until an event of interest occurs, often referred to as the "time-to-event" or "survival time." This method is particularly useful when dealing with censored data, where the event has not occurred for some individuals by the end of the study period, or when the exact time of the event is unknown.
 Building a Survival Tree
Constructing a...
181
Unusual Results01:16

Unusual Results

3.4K
Unusual results are those that have a very low chance of occurring. Unusual results can be identified using probabilities and the range rule of thumb. In problems involving probability, unusual results can be observed in 2 instances – an unusually high number of successes or an unusually low number of successes.
According to the range rule of thumb, any value above or below two standard deviations, 2σ  from the mean, μ  is considered unusual.
Maximum unusual value =...
3.4K
Quantifying and Rejecting Outliers: The Grubbs Test01:02

Quantifying and Rejecting Outliers: The Grubbs Test

2.6K
Sometimes, a data set can have a recorded numerical observation that greatly  deviates from the rest of the data. Assuming that the data is normally distributed, a statistical method called the Grubbs test can be used to determine whether the observation is truly an outlier.  To perform a two-tailed Grubbs test, first, calculate the absolute difference between the outlier and the mean. Then, calculate the ratio between this difference and the standard deviation of the sample. This...
2.6K
Expected Frequencies in Goodness-of-Fit Tests01:19

Expected Frequencies in Goodness-of-Fit Tests

3.7K
A goodness-of-fit test is conducted to determine whether the observed frequency values are statistically similar to the frequencies expected for the dataset. Suppose the expected frequencies for a dataset are equal such as when predicting the frequency of any number appearing when casting a die. In that case, the expected frequency is the ratio of the total number of observations (n)  to the number of categories (k).
3.7K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

AbDist: a lightweight, distance-based model for antibody affinity prediction as an interpretable benchmark for machine learning models.

mAbs·2026
Same author

Chemoinformatic regression methods and their applicability domain.

Molecular informatics·2024
Same author

xMaP-An Interpretable Alignment-Free Four-Dimensional Quantitative Structure-Activity Relationship Technique Based on Molecular Surface Properties and Conformer Ensembles.

Journal of chemical information and modeling·2017
Same author

Efficiency of different measures for defining the applicability domain of classification models.

Journal of cheminformatics·2017
Same journal

Correction: Chen et al. Chemical Composition of <i>Litsea pungens</i> Essential Oil and Its Potential Antioxidant and Antimicrobial Activities. <i>Molecules</i> 2023, <i>28</i>, 6835.

Molecules (Basel, Switzerland)·2026
Same journal

Correction: Ruan et al. Comparison of Extraction, Isolation, Purification, Structural Characterization and Immunomodulatory Activity of Polysaccharides from Two Species of <i>Cistanche</i>. <i>Molecules</i> 2025, <i>30</i>, 4754.

Molecules (Basel, Switzerland)·2026
Same journal

Correction: Li et al. Gastrodin Ameliorates Cognitive Dysfunction in Vascular Dementia Rats by Suppressing Ferroptosis via the Regulation of the Nrf2/Keap1-GPx4 Signaling Pathway. <i>Molecules</i> 2022, <i>27</i>, 6311.

Molecules (Basel, Switzerland)·2026
Same journal

Correction: Zueva et al. Steady-State Kinetics of Enzyme-Catalyzed Hydrolysis of Echothiophate, a P-S Bonded Organophosphorus as Monitored by Spectrofluorimetry. <i>Molecules</i> 2020, <i>25</i>, 1371.

Molecules (Basel, Switzerland)·2026
Same journal

1,4-Diazatriphenylene and Its Hetero-Fused Analogs: Synthesis and Applications.

Molecules (Basel, Switzerland)·2026
Same journal

Comparative Phytochemical Studies on the Aerial Parts of <i>Teucrium davaeanum</i> Coss. and <i>Teucrium zanonii</i> Pamp.

Molecules (Basel, Switzerland)·2026
See all related articles

Related Experiment Video

Updated: Oct 13, 2025

An R-Based Landscape Validation of a Competing Risk Model
05:37

An R-Based Landscape Validation of a Competing Risk Model

Published on: September 16, 2022

2.2K

Evaluating High-Variance Leaves as Uncertainty Measure for Random Forest Regression.

Thomas-Martin Dutschmann1, Knut Baumann1

  • 1Institute for Medicinal and Pharmaceutical Chemistry, University of Technology Braunschweig, Beethovenstraße 55, 38106 Braunschweig, Germany.

Molecules (Basel, Switzerland)
|November 13, 2021
PubMed
Summary
This summary is machine-generated.

High-variance leaves in Random Forests offer a meaningful uncertainty measure for molecular property prediction, though not superior to standard ensemble deviation. This finding is crucial for reliable drug design models.

Keywords:
Random Forestchemoinformaticsensemblemachine learningregressionreliability measureuncertainty measure

More Related Videos

Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach
04:35

Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach

Published on: July 3, 2020

3.5K
Measuring the Subjective Value of Risky and Ambiguous Options using Experimental Economics and Functional MRI Methods
13:04

Measuring the Subjective Value of Risky and Ambiguous Options using Experimental Economics and Functional MRI Methods

Published on: September 19, 2012

12.2K

Related Experiment Videos

Last Updated: Oct 13, 2025

An R-Based Landscape Validation of a Competing Risk Model
05:37

An R-Based Landscape Validation of a Competing Risk Model

Published on: September 16, 2022

2.2K
Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach
04:35

Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach

Published on: July 3, 2020

3.5K
Measuring the Subjective Value of Risky and Ambiguous Options using Experimental Economics and Functional MRI Methods
13:04

Measuring the Subjective Value of Risky and Ambiguous Options using Experimental Economics and Functional MRI Methods

Published on: September 19, 2012

12.2K

Area of Science:

  • Machine Learning
  • Chemoinformatics
  • Drug Design

Background:

  • Uncertainty measures are vital for assessing predictive model reliability, particularly in drug design for molecular property prediction.
  • Random Forests, a traditional machine learning technique in chemoinformatics, inherently provide uncertainty measures through their ensemble of decision trees.
  • The standard deviation of ensemble predictions is the conventional uncertainty measure for Random Forests.

Purpose of the Study:

  • To investigate novel uncertainty measures for Random Forests in molecular property prediction.
  • To evaluate the performance of high-variance leaves as an uncertainty measure compared to standard ensemble deviation.
  • To conduct large-scale estimations for robust claims on uncertainty measure efficacy.

Main Methods:

  • Utilized Random Forests for molecular property prediction tasks.
  • Introduced and analyzed high-variance leaves within decision trees as a novel uncertainty metric.
  • Performed large-scale comparisons across multiple chemoinformatic regression datasets.

Main Results:

  • High-variance leaf uncertainty was found to be a meaningful measure of model reliability.
  • This novel measure did not consistently outperform the standard ensemble deviation in predictive accuracy.
  • The performance of high-variance leaves varied depending on the specific dataset characteristics.

Conclusions:

  • While high-variance leaves offer a valid approach to uncertainty estimation in Random Forests, they do not surpass the established ensemble standard deviation.
  • Further research may be needed to refine or combine uncertainty measures for enhanced reliability in machine learning models for drug design.
  • Large-scale validation is essential for generalizing the performance of uncertainty quantification techniques.