Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Randomized Experiments01:13

Randomized Experiments

9.3K
The randomization process involves assigning study participants randomly to experimental or control groups based on their probability of being equally assigned. Randomization is meant to eliminate selection bias and balance known and unknown confounding factors so that the control group is similar to the treatment group as much as possible. A computer program and a random number generator can be used to assign participants to groups in a way that minimizes bias.
Simple randomization
Simple...
9.3K
Random Variables01:09

Random Variables

18.7K
A random variable is a single numerical value that indicates the outcome of a procedure. The concept of random variables is fundamental to the probability theory and was introduced by a Russian mathematician, Pafnuty Chebyshev, in the mid-nineteenth century.
Uppercase letters such as X or Y denote a random variable. Lowercase letters like x or y denote the value of a random variable. If X is a random variable, then X is written in words, and x is given as a number.
For example, let X = the...
18.7K
Survival Tree01:19

Survival Tree

464
Survival trees are a non-parametric method used in survival analysis to model the relationship between a set of covariates and the time until an event of interest occurs, often referred to as the "time-to-event" or "survival time." This method is particularly useful when dealing with censored data, where the event has not occurred for some individuals by the end of the study period, or when the exact time of the event is unknown.
 Building a Survival Tree
Constructing a...
464
Multiple Regression01:25

Multiple Regression

4.3K
Multiple regression assesses a linear relationship between one response or dependent variable and two or more independent variables. It has many practical applications.
Farmers can use multiple regression to determine the crop yield based on more than one factor, such as water availability, fertilizer, soil properties, etc. Here, the crop yield is the response or dependent variable as it depends on the other independent variables. The analysis requires the construction of a scatter plot...
4.3K
Quantifying and Rejecting Outliers: The Grubbs Test01:02

Quantifying and Rejecting Outliers: The Grubbs Test

4.5K
Sometimes, a data set can have a recorded numerical observation that greatly  deviates from the rest of the data. Assuming that the data is normally distributed, a statistical method called the Grubbs test can be used to determine whether the observation is truly an outlier.  To perform a two-tailed Grubbs test, first, calculate the absolute difference between the outlier and the mean. Then, calculate the ratio between this difference and the standard deviation of the sample. This...
4.5K
Random Sampling Method01:09

Random Sampling Method

15.6K
Sampling is a technique to select a portion (or subset) of the larger population and study that portion (the sample) to gain information about the population. Data are the result of sampling from a population. The sampling method ensures that samples are drawn without bias and accurately represent the population. Because measuring the entire population in a study is not practical, researchers use samples to represent the population of interest. Among the various sampling methods used by...
15.6K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

In Reply.

Deutsches Arzteblatt international·2026
Same author

Interpretation of Pharmacometabolomics Results: Fingerprint of Drug Exposure or Confounder Effects? Insights from a Urinary Metabolomics Study with Voriconazole in Healthy Participants.

International journal of molecular sciences·2026
Same author

The phenotypic spectrum and genetic determinants of severe spinal muscular atrophy in individuals with a single <i>SMN2</i> copy: an international retrospective observational study.

EClinicalMedicine·2026
Same author

Urinary Metabolomics Predict Acute Kidney Injury in Very-Low-Birth-Weight Infants with Patent Ductus Arteriosus.

Biomolecules·2026
Same author

Confidence Intervals for Comparing Two Independent Folded Normals: A Case Study in Bunion Surgery.

Statistics in medicine·2026
Same author

Emulated Effects of Glucagon-Like Peptide 1 Receptor Agonist Therapy in the General Population.

Journal of the American College of Cardiology·2026
Same journal

NAP: an open source pipeline for cross-domain microbiome profiling using Nanopore sequencing-derived amplicon data.

BMC bioinformatics·2026
Same journal

SurvGME: an R package for survival analysis with graphical and measurement error models.

BMC bioinformatics·2026
Same journal

SimMapNet: a Bayesian framework for gene regulatory network inference using gene ontology similarities as external hint.

BMC bioinformatics·2026
Same journal

Dual channel drug-drug interactions extraction based on cross attention.

BMC bioinformatics·2026
Same journal

FeSseqdb: a curated sequence-level database and interpretable machine learning framework for identifying iron-sulfur proteins.

BMC bioinformatics·2026
Same journal

Benchmarking DNA barcode decoding strategies under high error rates.

BMC bioinformatics·2026
See all related articles

Related Experiment Video

Updated: Mar 23, 2026

Comparison of Predictive Performance of Three Lymph Node Staging Systems in Colorectal Signet Ring Cell Carcinoma Based on Machine Learning Model
07:13

Comparison of Predictive Performance of Three Lymph Node Staging Systems in Colorectal Signet Ring Cell Carcinoma Based on Machine Learning Model

Published on: April 18, 2025

846

Do little interactions get lost in dark random forests?

Marvin N Wright1, Andreas Ziegler1,2,3, Inke R König4

  • 1Institut für Medizinische Biometrie und Statistik, Universität zu Lübeck, Universitätsklinikum Schleswig-Holstein, Campus Lübeck, Ratzeburger Allee 160, Lübeck, 23562, Germany.

BMC Bioinformatics
|April 1, 2016
PubMed
Summary
This summary is machine-generated.

Random forests can capture gene-gene interactions, but their importance measures struggle to detect these interactions distinctly from marginal effects. This masking effect necessitates caution when interpreting random forest findings regarding interactions.

Keywords:
EpistasisGene-gene interactionsRandom forestsTreesVariable importance

More Related Videos

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances
07:35

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

8.1K
Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach
04:35

Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach

Published on: July 3, 2020

3.8K

Related Experiment Videos

Last Updated: Mar 23, 2026

Comparison of Predictive Performance of Three Lymph Node Staging Systems in Colorectal Signet Ring Cell Carcinoma Based on Machine Learning Model
07:13

Comparison of Predictive Performance of Three Lymph Node Staging Systems in Colorectal Signet Ring Cell Carcinoma Based on Machine Learning Model

Published on: April 18, 2025

846
Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances
07:35

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

8.1K
Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach
04:35

Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach

Published on: July 3, 2020

3.8K

Area of Science:

  • Computational biology
  • Bioinformatics
  • Machine learning in genomics

Background:

  • Random forests are frequently cited for their ability to identify interaction effects.
  • The distinction between capturing and detecting interaction effects in random forests remains ambiguous.
  • This study addresses the capability of random forest variable importance measures to discern gene-gene interactions.

Purpose of the Study:

  • To investigate if random forest variable importance measures can capture or detect gene-gene interactions.
  • To differentiate between capturing (identifying a variable involved in an interaction) and detecting (identifying the interaction effect itself).

Main Methods:

  • Extensive simulation studies were conducted.
  • Evaluated various random forest variable importance measures, including Gini importance, permutation importance, and pairwise importance methods.
  • Assessed the ability of these measures to capture and detect gene-gene interactions under different scenarios.

Main Results:

  • Gini importance captured interactions in most simulations but was often masked by marginal effects.
  • Permutation importance showed a lower proportion of captured interactions.
  • Pairwise measures performed similarly, with joint variable importance showing a slight advantage.
  • The overall detection of interactions was low, with models containing only marginal effects often showing higher detection rates than those with interaction effects.

Conclusions:

  • Random forests can capture gene-gene interactions, but current variable importance measures fail to detect them as distinct interactions.
  • Marginal effects frequently mask interaction effects, preventing differentiation.
  • Researchers should exercise caution when attributing the discovery of interactions solely to random forests.