Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Confidence Intervals

Confidence Intervals

An unbiased point estimate is often insufficient to predict a population estimate, such as population mean or population proportion. In this scenario, a confidence interval is used. A confidence interval is an estimate similar to a sample proportion. However, unlike the point estimate which is a single value, the confidence interval contains a range of values. These values have lower and upper limits, known as confidence limits, and can be designated as L1 and L2, respectively.
A...

Interpretation of Confidence Intervals

Interpretation of Confidence Intervals

A confidence interval is a better estimate of the population than a point estimate, as it uses a range of values from a sample instead of a single value.
Confidence intervals have confidence coefficients that are crucial for their interpretation. The most common confidence coefficients are 0.90, 0.95, and 0.99, which can be written as percentages–90%, 95%, and 99%, respectively.
Suppose a person calculates a confidence interval with a confidence coefficient of 0.95. In that case, they can...

Uncertainty: Confidence Intervals

Uncertainty: Confidence Intervals

The confidence interval is the range of values around the mean that contains the true mean. It is expressed as a probability percentage. The interpretation of a 95% confidence interval, for instance, is that the statistician is 95% confident that the true mean falls within the interval. The upper and lower limits of this range are known as confidence limits. The confidence limits for the true mean are estimated from the sample's mean, the standard deviation, and the statistical factor...

Random Error

Random Error

Random or indeterminate errors originate from various uncontrollable variables, such as variations in environmental conditions, instrument imperfections, or the inherent variability of the phenomena being measured. Usually, these errors cannot be predicted, estimated, or characterized because their direction and magnitude often vary in magnitude and direction even during consecutive measurements. As a result, they are difficult to eliminate. However, the aggregate effect of these errors can be...

Random Variables

Random Variables

A random variable is a single numerical value that indicates the outcome of a procedure. The concept of random variables is fundamental to the probability theory and was introduced by a Russian mathematician, Pafnuty Chebyshev, in the mid-nineteenth century.
Uppercase letters such as X or Y denote a random variable. Lowercase letters like x or y denote the value of a random variable. If X is a random variable, then X is written in words, and x is given as a number.
For example, let X = the...

Confidence Interval for Estimating Population Mean

Confidence Interval for Estimating Population Mean

A point estimate of the population mean is obtained from a single sample. Such a point estimate does not represent a population well because it needs to account for variability in the population. Single point estimate can also be biased despite the sample being selected randomly. Thus, a point estimate is often unreliable. A confidence interval is needed to reduce this unreliability.
A confidence interval for the mean is a range of values that provides an estimate of the population mean. As the...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Super greedy trees.

Artificial intelligence review·2026

Same author

Variable Priority for Unsupervised Variable Selection.

Pattern recognition·2026

Same author

Individual variable priority: a model-independent local gradient method for variable importance.

Artificial intelligence review·2025

Same author

Cure and death play a role in understanding dynamics for COVID-19: Data-driven competing risk compartmental models, with and without vaccination.

PloS one·2021

Same author

Unsupervised random forests.

Statistical analysis and data mining·2021

Same author

Discussion on "Nonparametric variable importance assessment using machine learning techniques" by Brian D. Williamson, Peter B. Gilbert, Marco Carone, and Noah Simon.

Biometrics·2020

Same journal

Interpretable Bayesian Modeling for Multireader Multicase Studies: Addressing Overdispersion and Limited Sample Size in Diagnostic Enhancement Evaluation.

Statistics in medicine·2026

Same journal

Adaptive Sequential Multiple Hypotheses Testing for Concomitant Vaccine Safety Surveillance.

Statistics in medicine·2026

Same journal

Novel Distance Regression for Repeated Outcomes With Missing Data: Applications to Longitudinal and Crossover Studies of Microbiome Beta-Diversity.

Statistics in medicine·2026

Same journal

Optimal Weighted Tests for Replication Studies and the 'Two-Trials Rule' With Multiple Hypotheses.

Statistics in medicine·2026

Same journal

Identifiable Copula-Double-Cox Models: A Fully Parametric Framework for Dependent Right-Censored Survival Data.

Statistics in medicine·2026

Same journal

Moving From Individualized Risk-Based Prevention to Benefit-Based Prevention: Estimating Individualized Life-Years Gained From Prevention Services as a Basis for Eligibility.

Statistics in medicine·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Feb 9, 2026

Establishing a Competing Risk Regression Nomogram Model for Survival Data

Establishing a Competing Risk Regression Nomogram Model for Survival Data

Published on: October 23, 2020

Standard errors and confidence intervals for variable importance in random forest regression, classification, and

Hemant Ishwaran¹, Min Lu¹

¹Division of Biostatistics, Miller School of Medicine, University of Miami, Miami, Florida, USA.

Statistics in Medicine

|June 6, 2018

Summary

This summary is machine-generated.

We introduce a subsampling method to estimate the variance of variable importance (VIMP) in random forests. This computationally fast approach enables confidence intervals and is ideal for big data analysis.

Keywords:

VIMP bootstrap delete-d jackknife permutation importance prediction error subsampling

More Related Videos

Methods of Soil Resampling to Monitor Changes in the Chemical Concentrations of Forest Soils

Methods of Soil Resampling to Monitor Changes in the Chemical Concentrations of Forest Soils

Published on: November 25, 2016

Author Spotlight: Evaluating the Adjuvant Efficacy and Safety of Angong Niuhuang Pill in Viral Encephalitis Treatment

Author Spotlight: Evaluating the Adjuvant Efficacy and Safety of Angong Niuhuang Pill in Viral Encephalitis Treatment

Published on: April 19, 2024

Related Experiment Videos

Last Updated: Feb 9, 2026

Establishing a Competing Risk Regression Nomogram Model for Survival Data

Establishing a Competing Risk Regression Nomogram Model for Survival Data

Published on: October 23, 2020

Methods of Soil Resampling to Monitor Changes in the Chemical Concentrations of Forest Soils

Methods of Soil Resampling to Monitor Changes in the Chemical Concentrations of Forest Soils

Published on: November 25, 2016

Author Spotlight: Evaluating the Adjuvant Efficacy and Safety of Angong Niuhuang Pill in Viral Encephalitis Treatment

Author Spotlight: Evaluating the Adjuvant Efficacy and Safety of Angong Niuhuang Pill in Viral Encephalitis Treatment

Published on: April 19, 2024

Area of Science:

Machine Learning
Statistical Modeling
Data Science

Background:

Random forests are widely used for prediction and provide a nonparametric measure of variable importance (VIMP).
A key limitation of VIMP is the lack of systematic methods for variance estimation.
This hinders the construction of reliable confidence intervals for VIMP.

Purpose of the Study:

To propose a novel subsampling approach for estimating VIMP variance.
To enable the construction of confidence intervals for VIMP in random forests.
To offer a computationally efficient solution applicable to large datasets.

Main Methods:

A subsampling strategy is developed to estimate the variance of VIMP.
The method is validated through extensive simulations across various settings (regression, classification, survival).
Comparison with existing methods like the delete-d jackknife and bootstrap estimators is performed.

Main Results:

The proposed subsampling method effectively estimates VIMP variance and facilitates confidence interval construction.
The delete-d jackknife variance estimator shows particular efficacy at low subsampling rates due to bias correction.
Subsampling-based estimators demonstrate competitive performance against bootstrap methods, especially in handling ties.

Conclusions:

Subsampling offers a fast and effective solution for VIMP variance estimation in random forests.
This method is broadly applicable across diverse data analysis problems, including big data scenarios.
The proposed technique enhances the interpretability and reliability of VIMP in machine learning models.