Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Experiment Videos

Conditional variable importance for random forests.

Carolin Strobl1, Anne-Laure Boulesteix, Thomas Kneib

  • 1Department of Statistics, Ludwig-Maximilians-Universität Munchen, Ludwigstrasse 33, D-80539 München, Germany. carolin.strobl@stat.uni-muenchen.de

BMC Bioinformatics
|July 16, 2008
PubMed
Summary
This summary is machine-generated.

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Methodological guidance on clinical prediction models in mental health research.

Psychological medicine·2026
Same author

Solvent-dependent fluorescence dynamics and ultrafast optical nonlinearity in Tectona grandis L.f. leaf extract.

Spectrochimica acta. Part A, Molecular and biomolecular spectroscopy·2026
Same author

Satellite data show trees delay budburst across landscapes to escape herbivores.

Nature ecology & evolution·2026
Same author

Flexible Bayesian modeling of non-equidispersed counts with penalized complexity priors in disease incidence studies.

Statistical methods in medical research·2026
Same author

STrategies for developing REseArch Methods guidance (STREAM): Protocol.

Journal of clinical epidemiology·2026
Same author

The statistical software revolution in pharmaceutical development: challenges and opportunities in open source.

Drug discovery today·2026
Same journal

OpenIMC: an open-source platform for analyzing single-cell and spatial proteomics by imaging mass cytometry.

BMC bioinformatics·2026
Same journal

NAP: an open source pipeline for cross-domain microbiome profiling using Nanopore sequencing-derived amplicon data.

BMC bioinformatics·2026
Same journal

SurvGME: an R package for survival analysis with graphical and measurement error models.

BMC bioinformatics·2026
Same journal

SimMapNet: a Bayesian framework for gene regulatory network inference using gene ontology similarities as external hint.

BMC bioinformatics·2026
Same journal

Dual channel drug-drug interactions extraction based on cross attention.

BMC bioinformatics·2026
Same journal

FeSseqdb: a curated sequence-level database and interpretable machine learning framework for identifying iron-sulfur proteins.

BMC bioinformatics·2026
See all related articles

Random forests’ variable importance measures are biased by correlated predictors. A new conditional permutation method is developed to provide a more reliable assessment of true predictor impact in machine learning models.

Area of Science:

  • Machine learning
  • Statistical modeling
  • Bioinformatics

Background:

  • Random forests are popular for
  • small n large p
  • problems and complex interactions.

Purpose of the Study:

  • To address the bias in random forest variable importance measures caused by correlated predictors.
  • To develop an improved method for calculating variable importance that accounts for predictor correlations.

Main Methods:

  • Investigated mechanisms causing bias in random forest variable importance.
  • Developed a conditional permutation scheme for variable importance calculation.
  • Compared the new conditional scheme against the standard unconditional approach.

Main Results:

Related Experiment Videos

  • Identified two key mechanisms driving the bias: predictor selection preference during tree building and an advantage for correlated variables in unconditional permutation.
  • The new conditional permutation scheme was developed based on these findings.

Conclusions:

  • The conditional variable importance measure more accurately reflects the true impact of predictor variables compared to the original marginal approach.
  • This improved method enhances the reliability of random forest variable importance for applications like gene expression studies.