Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Receiver Operating Characteristic Plot

Receiver Operating Characteristic Plot

A ROC (Receiver Operating Characteristic) plot is a graphical tool used to assess the performance of a binary classification model by illustrating the trade-off between sensitivity (true positive rate) and specificity (false positive rate). By plotting sensitivity against 1 - specificity across various threshold settings, the ROC curve shows how well the model distinguishes between classes, with a curve closer to the top-left corner indicating a more accurate model. The area under the ROC curve...

Wald-Wolfowitz Runs Test I

Wald-Wolfowitz Runs Test I

The Wald-Wolfowitz test, also known as the runs test, is a nonparametric statistical test used to assess the randomness of a sequence of two different types of elements (e.g., positive/negative values, successes/failures). It examines whether the order of the elements in a sequence is random or if there is a pattern or trend present. This nonparametric test applies to any ordered data despite the population and sample data distribution, even if a higher sample size is available.
The test works...

Survival Tree

Survival Tree

Survival trees are a non-parametric method used in survival analysis to model the relationship between a set of covariates and the time until an event of interest occurs, often referred to as the "time-to-event" or "survival time." This method is particularly useful when dealing with censored data, where the event has not occurred for some individuals by the end of the study period, or when the exact time of the event is unknown.
Building a Survival Tree
Constructing a survival tree begins...

Random Variables

Random Variables

A random variable is a single numerical value that indicates the outcome of a procedure. The concept of random variables is fundamental to the probability theory and was introduced by a Russian mathematician, Pafnuty Chebyshev, in the mid-nineteenth century.
Uppercase letters such as X or Y denote a random variable. Lowercase letters like x or y denote the value of a random variable. If X is a random variable, then X is written in words, and x is given as a number.
For example, let X = the...

Variability: Analysis

Variability: Analysis

Measures of variability are statistical metrics that reveal the dispersion pattern within a dataset. They are pivotal in biostatistics, providing insights into the heterogeneity within health and biological data. Variability signifies the degree to which data points diverge from one another, helping researchers understand the potential range of values and associated uncertainty within the data.
The range is a simple measure of variability, indicating the difference between the highest and...

Randomized Experiments

Randomized Experiments

The randomization process involves assigning study participants randomly to experimental or control groups based on their probability of being equally assigned. Randomization is meant to eliminate selection bias and balance known and unknown confounding factors so that the control group is similar to the treatment group as much as possible. A computer program and a random number generator can be used to assign participants to groups in a way that minimizes bias.
Simple randomization
Simple...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

STrategies for developing REseArch Methods guidance (STREAM): Protocol.

Journal of clinical epidemiology·2026

Same author

The statistical software revolution in pharmaceutical development: challenges and opportunities in open source.

Drug discovery today·2026

Same author

On "Confirmatory" Methodological Research in Statistics and Related Fields.

Statistics in medicine·2025

Same author

ChatGPT as a Tool for Biostatisticians: A Tutorial on Applications, Opportunities, and Limitations.

Statistics in medicine·2025

Same author

Rethinking the Handling of Method Failure in Comparison Studies.

Statistics in medicine·2025

Same author

Comparing supervised machine learning algorithms for the prediction of partial arterial pressure of oxygen during craniotomy.

BMC medical informatics and decision making·2025

Same journal

OpenIMC: an open-source platform for analyzing single-cell and spatial proteomics by imaging mass cytometry.

BMC bioinformatics·2026

Same journal

NAP: an open source pipeline for cross-domain microbiome profiling using Nanopore sequencing-derived amplicon data.

BMC bioinformatics·2026

Same journal

SurvGME: an R package for survival analysis with graphical and measurement error models.

BMC bioinformatics·2026

Same journal

SimMapNet: a Bayesian framework for gene regulatory network inference using gene ontology similarities as external hint.

BMC bioinformatics·2026

Same journal

Dual channel drug-drug interactions extraction based on cross attention.

BMC bioinformatics·2026

Same journal

FeSseqdb: a curated sequence-level database and interpretable machine learning framework for identifying iron-sulfur proteins.

BMC bioinformatics·2026

See all related articles

Search research articles

Related Experiment Video

Updated: May 12, 2026

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

An AUC-based permutation variable importance measure for random forests.

Silke Janitza¹, Carolin Strobl, Anne-Laure Boulesteix

¹Department of Medical Informatics, Biometry and Epidemiology, University of Munich, Marchioninistr. 15, D-81377, Munich, Germany. janitza@ibe.med.uni-muenchen.de

BMC Bioinformatics

|April 9, 2013

Summary

This summary is machine-generated.

The standard permutation variable importance measure (VIM) in random forest models performs poorly with unbalanced data. An improved AUC-based VIM offers better performance for imbalanced datasets, maintaining similar results for balanced data.

Related Experiment Videos

Last Updated: May 12, 2026

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

Area of Science:

Machine Learning
Statistical Modeling
Bioinformatics

Background:

Random Forest (RF) is a popular classification and variable importance tool for high-dimensional data.
RF classification performance degrades with unbalanced data (unequal class sizes).
Variable importance measures (VIMs) in RF have not been thoroughly evaluated for unbalanced data.

Purpose of the Study:

To investigate the performance of the standard permutation VIM with unbalanced data.
To introduce and evaluate a novel AUC-based permutation VIM robust to class imbalance.

Main Methods:

Explored standard and AUC-based permutation VIMs using simulated and real-world imbalanced datasets.
Compared VIM performance across varying levels of class imbalance.

Main Results:

The novel AUC-based permutation VIM significantly outperforms the standard permutation VIM on unbalanced data.
Both VIMs demonstrate comparable performance on balanced data.
Standard permutation VIM's discrimination ability decreases with increasing class imbalance.

Conclusions:

The AUC-based permutation VIM is a more robust measure for variable importance in the presence of class imbalance.
The new VIM is implemented in the R package 'party' for conditional inference trees.
Study codes are available for reproducibility.