Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Variation

Variation

An important characteristic of any set of data is the variation in the data. In some data sets, the data values are concentrated closely near the mean; in other data sets, the data values are more widely spread out from the mean. The most common measure of variation, or spread, is the standard deviation, which is the square root of variance.
When independent and dependent variables are plotted on a scatter plot, the slope of a line is a value that describes the rate of change between the two...

Variability: Analysis

Variability: Analysis

Measures of variability are statistical metrics that reveal the dispersion pattern within a dataset. They are pivotal in biostatistics, providing insights into the heterogeneity within health and biological data. Variability signifies the degree to which data points diverge from one another, helping researchers understand the potential range of values and associated uncertainty within the data.
The range is a simple measure of variability, indicating the difference between the highest and...

Survival Tree

Survival Tree

Survival trees are a non-parametric method used in survival analysis to model the relationship between a set of covariates and the time until an event of interest occurs, often referred to as the "time-to-event" or "survival time." This method is particularly useful when dealing with censored data, where the event has not occurred for some individuals by the end of the study period, or when the exact time of the event is unknown.
Building a Survival Tree
Constructing a survival tree begins...

Correlation and Regression

Correlation and Regression

In statistics, correlation describes the degree of association between two variables. In the subfield of linear regression, correlation is mathematically expressed by the correlation coefficient, which describes the strength and direction of the relationship between two variables. The coefficient is symbolically represented by 'r' and ranges from -1 to +1. A positive value indicates a positive correlation where the two variables move in the same direction. A negative value suggests a negative...

Multiple Regression

Multiple Regression

Multiple regression assesses a linear relationship between one response or dependent variable and two or more independent variables. It has many practical applications.
Farmers can use multiple regression to determine the crop yield based on more than one factor, such as water availability, fertilizer, soil properties, etc. Here, the crop yield is the response or dependent variable as it depends on the other independent variables. The analysis requires the construction of a scatter plot...

Randomized Experiments

Randomized Experiments

The randomization process involves assigning study participants randomly to experimental or control groups based on their probability of being equally assigned. Randomization is meant to eliminate selection bias and balance known and unknown confounding factors so that the control group is similar to the treatment group as much as possible. A computer program and a random number generator can be used to assign participants to groups in a way that minimizes bias.
Simple randomization
Simple...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

In Reply.

Deutsches Arzteblatt international·2026

Same author

Interpretation of Pharmacometabolomics Results: Fingerprint of Drug Exposure or Confounder Effects? Insights from a Urinary Metabolomics Study with Voriconazole in Healthy Participants.

International journal of molecular sciences·2026

Same author

The phenotypic spectrum and genetic determinants of severe spinal muscular atrophy in individuals with a single <i>SMN2</i> copy: an international retrospective observational study.

EClinicalMedicine·2026

Same author

Urinary Metabolomics Predict Acute Kidney Injury in Very-Low-Birth-Weight Infants with Patent Ductus Arteriosus.

Biomolecules·2026

Same author

Confidence Intervals for Comparing Two Independent Folded Normals: A Case Study in Bunion Surgery.

Statistics in medicine·2026

Same author

Emulated Effects of Glucagon-Like Peptide 1 Receptor Agonist Therapy in the General Population.

Journal of the American College of Cardiology·2026

Same journal

OpenIMC: an open-source platform for analyzing single-cell and spatial proteomics by imaging mass cytometry.

BMC bioinformatics·2026

Same journal

NAP: an open source pipeline for cross-domain microbiome profiling using Nanopore sequencing-derived amplicon data.

BMC bioinformatics·2026

Same journal

SurvGME: an R package for survival analysis with graphical and measurement error models.

BMC bioinformatics·2026

Same journal

SimMapNet: a Bayesian framework for gene regulatory network inference using gene ontology similarities as external hint.

BMC bioinformatics·2026

Same journal

Dual channel drug-drug interactions extraction based on cross attention.

BMC bioinformatics·2026

Same journal

FeSseqdb: a curated sequence-level database and interpretable machine learning framework for identifying iron-sulfur proteins.

BMC bioinformatics·2026

See all related articles

Search research articles

Related Experiment Videos

The behaviour of random forest permutation-based variable importance measures under predictor correlation.

Kristin K Nicodemus¹, James D Malley, Carolin Strobl

¹Statistical Genetics, Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK. kristin.nicodemus@well.ox.ac.uk

BMC Bioinformatics

|March 2, 2010

Summary

This summary is machine-generated.

Random forests (RF) variable importance measures (VIMs) show increased importance for correlated predictors when they are associated with outcomes. Unconditional VIMs are unbiased under the null hypothesis, offering a practical choice for large datasets.

Related Experiment Videos

Area of Science:

Bioinformatics
Statistical Genetics
Machine Learning

Background:

Random forests (RF) are widely applied in genetic association and microarray studies.
High predictor correlation is common in these applications.
Conflicting conclusions exist regarding RF variable importance measures (VIMs).

Purpose of the Study:

To synthesize contradictory findings on RF VIMs.
To evaluate RF VIM performance under predictor correlation.
To clarify the behavior of different VIM types.

Main Methods:

Extended simulation study.
Analysis of permutation-based VIMs in RF.
Comparison of unconditional, conditional, and scaled VIMs.

Main Results:

Unconditional RF VIMs favor correlated predictors when associated with outcomes (HA), but are unbiased under the null hypothesis (H0).
Conditional VIMs reduce importance for correlated predictors under HA and are unbiased under H0.
Scaled VIMs demonstrate bias under both HA and H0.

Conclusions:

Unconditional unscaled VIMs are computationally efficient and unbiased under H0.
The interpretation of increased VIMs for correlated predictors depends on the application.
Correlated predictors may be advantageous in genetic studies but can lead to spurious signals.