Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Confidence Intervals01:21

Confidence Intervals

10.8K
An unbiased point estimate is often insufficient to predict a population estimate, such as population mean or population proportion. In this scenario, a confidence interval is used. A confidence interval is an estimate similar to a  sample proportion. However, unlike the point estimate which is a single value, the confidence interval  contains a range of values. These values have lower and upper limits, known as confidence limits, and can be designated as L1 and L2, respectively.
A...
10.8K
Interpretation of Confidence Intervals01:19

Interpretation of Confidence Intervals

10.1K
A confidence interval is a better estimate of the population than a point estimate, as it uses a range of values from a sample instead of a single value.
Confidence intervals have confidence coefficients that are crucial for their interpretation. The most common confidence coefficients are 0.90, 0.95, and 0.99, which can be written as percentages–90%, 95%, and 99%, respectively.
Suppose a person calculates a confidence interval with a confidence coefficient of 0.95. In that case, they can...
10.1K
Uncertainty: Confidence Intervals00:54

Uncertainty: Confidence Intervals

11.7K
The confidence interval is the range of values around the mean that contains the true mean. It is expressed as a probability percentage. The interpretation of a 95% confidence interval, for instance, is that the statistician is 95% confident that the true mean falls within the interval. The upper and lower limits of this range are known as confidence limits. The confidence limits for the true mean are estimated from the sample's mean, the standard deviation, and the statistical factor...
11.7K
Random Error01:04

Random Error

9.8K
Random or indeterminate errors originate from various uncontrollable variables, such as variations in environmental conditions, instrument imperfections, or the inherent variability of the phenomena being measured. Usually, these errors cannot be predicted, estimated, or characterized because their direction and magnitude often vary in magnitude and direction even during consecutive measurements. As a result, they are difficult to eliminate. However, the aggregate effect of these errors can be...
9.8K
Random Variables01:09

Random Variables

17.9K
A random variable is a single numerical value that indicates the outcome of a procedure. The concept of random variables is fundamental to the probability theory and was introduced by a Russian mathematician, Pafnuty Chebyshev, in the mid-nineteenth century.
Uppercase letters such as X or Y denote a random variable. Lowercase letters like x or y denote the value of a random variable. If X is a random variable, then X is written in words, and x is given as a number.
For example, let X = the...
17.9K
Confidence Interval for Estimating Population Mean01:25

Confidence Interval for Estimating Population Mean

8.9K
A point estimate of the population mean is obtained from a single sample. Such a point estimate does not represent a population well because it needs to account for variability in the population. Single point estimate can also be biased despite the sample being selected randomly. Thus, a point estimate is often unreliable. A confidence interval is needed to reduce this unreliability.
A confidence interval for the mean is a range of values that provides an estimate of the population mean. As the...
8.9K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Super greedy trees.

Artificial intelligence review·2026
Same author

Variable Priority for Unsupervised Variable Selection.

Pattern recognition·2026
Same author

Individual variable priority: a model-independent local gradient method for variable importance.

Artificial intelligence review·2025
Same author

Cure and death play a role in understanding dynamics for COVID-19: Data-driven competing risk compartmental models, with and without vaccination.

PloS one·2021
Same author

Unsupervised random forests.

Statistical analysis and data mining·2021
Same author

Discussion on "Nonparametric variable importance assessment using machine learning techniques" by Brian D. Williamson, Peter B. Gilbert, Marco Carone, and Noah Simon.

Biometrics·2020
Same journal

Interpretable Bayesian Modeling for Multireader Multicase Studies: Addressing Overdispersion and Limited Sample Size in Diagnostic Enhancement Evaluation.

Statistics in medicine·2026
Same journal

Adaptive Sequential Multiple Hypotheses Testing for Concomitant Vaccine Safety Surveillance.

Statistics in medicine·2026
Same journal

Novel Distance Regression for Repeated Outcomes With Missing Data: Applications to Longitudinal and Crossover Studies of Microbiome Beta-Diversity.

Statistics in medicine·2026
Same journal

Optimal Weighted Tests for Replication Studies and the 'Two-Trials Rule' With Multiple Hypotheses.

Statistics in medicine·2026
Same journal

Identifiable Copula-Double-Cox Models: A Fully Parametric Framework for Dependent Right-Censored Survival Data.

Statistics in medicine·2026
Same journal

Moving From Individualized Risk-Based Prevention to Benefit-Based Prevention: Estimating Individualized Life-Years Gained From Prevention Services as a Basis for Eligibility.

Statistics in medicine·2026
See all related articles

Related Experiment Video

Updated: Feb 9, 2026

Establishing a Competing Risk Regression Nomogram Model for Survival Data
04:57

Establishing a Competing Risk Regression Nomogram Model for Survival Data

Published on: October 23, 2020

10.9K

Standard errors and confidence intervals for variable importance in random forest regression, classification, and

Hemant Ishwaran1, Min Lu1

  • 1Division of Biostatistics, Miller School of Medicine, University of Miami, Miami, Florida, USA.

Statistics in Medicine
|June 6, 2018
PubMed
Summary
This summary is machine-generated.

We introduce a subsampling method to estimate the variance of variable importance (VIMP) in random forests. This computationally fast approach enables confidence intervals and is ideal for big data analysis.

Keywords:
VIMPbootstrapdelete-d jackknifepermutation importanceprediction errorsubsampling

More Related Videos

Methods of Soil Resampling to Monitor Changes in the Chemical Concentrations of Forest Soils
09:16

Methods of Soil Resampling to Monitor Changes in the Chemical Concentrations of Forest Soils

Published on: November 25, 2016

17.4K
Author Spotlight: Evaluating the Adjuvant Efficacy and Safety of Angong Niuhuang Pill in Viral Encephalitis Treatment
08:36

Author Spotlight: Evaluating the Adjuvant Efficacy and Safety of Angong Niuhuang Pill in Viral Encephalitis Treatment

Published on: April 19, 2024

1.2K

Related Experiment Videos

Last Updated: Feb 9, 2026

Establishing a Competing Risk Regression Nomogram Model for Survival Data
04:57

Establishing a Competing Risk Regression Nomogram Model for Survival Data

Published on: October 23, 2020

10.9K
Methods of Soil Resampling to Monitor Changes in the Chemical Concentrations of Forest Soils
09:16

Methods of Soil Resampling to Monitor Changes in the Chemical Concentrations of Forest Soils

Published on: November 25, 2016

17.4K
Author Spotlight: Evaluating the Adjuvant Efficacy and Safety of Angong Niuhuang Pill in Viral Encephalitis Treatment
08:36

Author Spotlight: Evaluating the Adjuvant Efficacy and Safety of Angong Niuhuang Pill in Viral Encephalitis Treatment

Published on: April 19, 2024

1.2K

Area of Science:

  • Machine Learning
  • Statistical Modeling
  • Data Science

Background:

  • Random forests are widely used for prediction and provide a nonparametric measure of variable importance (VIMP).
  • A key limitation of VIMP is the lack of systematic methods for variance estimation.
  • This hinders the construction of reliable confidence intervals for VIMP.

Purpose of the Study:

  • To propose a novel subsampling approach for estimating VIMP variance.
  • To enable the construction of confidence intervals for VIMP in random forests.
  • To offer a computationally efficient solution applicable to large datasets.

Main Methods:

  • A subsampling strategy is developed to estimate the variance of VIMP.
  • The method is validated through extensive simulations across various settings (regression, classification, survival).
  • Comparison with existing methods like the delete-d jackknife and bootstrap estimators is performed.

Main Results:

  • The proposed subsampling method effectively estimates VIMP variance and facilitates confidence interval construction.
  • The delete-d jackknife variance estimator shows particular efficacy at low subsampling rates due to bias correction.
  • Subsampling-based estimators demonstrate competitive performance against bootstrap methods, especially in handling ties.

Conclusions:

  • Subsampling offers a fast and effective solution for VIMP variance estimation in random forests.
  • This method is broadly applicable across diverse data analysis problems, including big data scenarios.
  • The proposed technique enhances the interpretability and reliability of VIMP in machine learning models.