Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Variability: Analysis

Variability: Analysis

Measures of variability are statistical metrics that reveal the dispersion pattern within a dataset. They are pivotal in biostatistics, providing insights into the heterogeneity within health and biological data. Variability signifies the degree to which data points diverge from one another, helping researchers understand the potential range of values and associated uncertainty within the data.
The range is a simple measure of variability, indicating the difference between the highest and...

Biostatistics: Overview

Biostatistics: Overview

Biostatistics plays a crucial role in understanding and analyzing data in healthcare and biology. Biostatisticians conduct experiments, gather evidence, and draw meaningful conclusions using statistical methods and techniques. Different variables form the foundation of biostatistical analysis, allowing researchers to understand and interpret data effectively. These variables are classified into different types, each serving a specific purpose in statistical analysis.
Discrete variables are...

Randomized Experiments

Randomized Experiments

The randomization process involves assigning study participants randomly to experimental or control groups based on their probability of being equally assigned. Randomization is meant to eliminate selection bias and balance known and unknown confounding factors so that the control group is similar to the treatment group as much as possible. A computer program and a random number generator can be used to assign participants to groups in a way that minimizes bias.
Simple randomization
Simple...

Multi-input and Multi-variable systems

Multi-input and Multi-variable systems

Cruise control systems in cars are designed as multi-input systems to maintain a driver's desired speed while compensating for external disturbances such as changes in terrain. The block diagram for a cruise control system typically includes two main inputs: the desired speed set by the driver and any external disturbances, such as the incline of the road. By adjusting the engine throttle, the system maintains the vehicle's speed as close to the desired value as possible.
In the absence...

Statistical Analysis: Overview

Statistical Analysis: Overview

When we take repeated measurements on the same or replicated samples, we will observe inconsistencies in the magnitude. These inconsistencies are called errors. To categorize and characterize these results and their errors, the researcher can use statistical analysis to determine the quality of the measurements and/or suitability of the methods.
One of the most commonly used statistical quantifiers is the mean, which is the ratio between the sum of the numerical values of all results and the...

Statistical Inference Techniques in Hypothesis Testing: Parametric Versus Nonparametric Data

Statistical Inference Techniques in Hypothesis Testing: Parametric Versus Nonparametric Data

Statistical inference techniques, paramount in hypothesis testing, differentiate into two broad categories: parametric and nonparametric statistics.
Parametric statistics, as the name suggests, assumes that data follow a specific distribution, often a normal distribution. This assumption enables robust hypothesis testing and estimation. Parametric methods, like the Student's t-test or Goodness-of-fit test, are frequently employed in biostatistics due to their robustness. For instance,...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Inference on summaries of a model-agnostic longitudinal variable importance trajectory with application to suicide prevention.

The annals of applied statistics·2026

Same author

Identifying anaphylaxis using weakly-supervised prediction models and natural language processing.

medRxiv : the preprint server for health sciences·2026

Same author

Towards a Unified Theory for Semiparametric Data Fusion with Individual-Level Data.

Annals of statistics·2026

Same author

Household Transmission of Enterovirus D68, Washington and Oregon, United States, 2022-2024.

Emerging infectious diseases·2026

Same author

Efficacy of codesigned COVID-19 booster vaccine promotion materials for long-term care staff: a cluster-randomized trial.

BMC public health·2026

Same author

Substitution Patterns After Discontinuation of CNS-Active Medications in Older Adults in Primary Care.

Journal of the American Geriatrics Society·2026

Same journal

Instrumental Variable Estimation of Marginal Structural Mean Models for Time-Varying Treatment.

Journal of the American Statistical Association·2026

Same journal

Semiparametric Joint Modeling for Survival Analysis with Longitudinal Covariates.

Journal of the American Statistical Association·2026

Same journal

Dimension Reduction for Large-Scale Federated Data: Statistical Rate and Asymptotic Inference.

Journal of the American Statistical Association·2026

Same journal

Facilitating Heterogeneous Effect Estimation via Statistically Efficient Categorical Modifiers.

Journal of the American Statistical Association·2026

Same journal

Nonparametric Density Estimation of a Long-Term Trend from Repeated Semicontinuous Data.

Journal of the American Statistical Association·2026

Same journal

Functional Integrative Bayesian Analysis of High-dimensional Multiplatform Clinicogenomic Data.

Journal of the American Statistical Association·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Jul 10, 2025

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

A general framework for inference on algorithm-agnostic variable importance.

Brian D Williamson¹, Peter B Gilbert^1,2, Noah R Simon²

¹Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center.

Journal of the American Statistical Association

|November 20, 2023

Summary

This summary is machine-generated.

This study introduces a new framework for assessing variable importance, offering a more accurate measure of feature prediction potential beyond specific algorithms. This method provides reliable confidence intervals and hypothesis testing for feature value.

Keywords:

machine learning statistical inference targeted learning variable importance

More Related Videos

Probing the Limits of Egg Recognition Using Egg Rejection Experiments Along Phenotypic Gradients

Probing the Limits of Egg Recognition Using Egg Rejection Experiments Along Phenotypic Gradients

Published on: August 22, 2018

The Innovation Arena: A Method for Comparing Innovative Problem-Solving Across Groups

The Innovation Arena: A Method for Comparing Innovative Problem-Solving Across Groups

Published on: May 13, 2022

Related Experiment Videos

Last Updated: Jul 10, 2025

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

Probing the Limits of Egg Recognition Using Egg Rejection Experiments Along Phenotypic Gradients

Probing the Limits of Egg Recognition Using Egg Rejection Experiments Along Phenotypic Gradients

Published on: August 22, 2018

The Innovation Arena: A Method for Comparing Innovative Problem-Solving Across Groups

The Innovation Arena: A Method for Comparing Innovative Problem-Solving Across Groups

Published on: May 13, 2022

Area of Science:

Statistical modeling
Machine learning
Bioinformatics

Background:

Assessing feature importance is crucial for prediction tasks.
Current methods often depend on specific algorithms, potentially misrepresenting intrinsic feature value.
There's a need for algorithm-agnostic variable importance measures.

Purpose of the Study:

To propose a general framework for nonparametric, interpretable, and algorithm-agnostic variable importance inference.
To define variable importance based on population-level prediction contrasts.
To develop methods for valid confidence intervals and hypothesis testing.

Main Methods:

Developed a general framework for nonparametric inference on variable importance.
Defined variable importance as a contrast between oracle predictiveness with and without specific features.
Proposed an efficient nonparametric estimation procedure for confidence intervals.
Outlined a strategy for testing the null importance hypothesis.

Main Results:

The proposed framework provides interpretable and algorithm-agnostic variable importance.
The estimation procedure allows for valid confidence intervals, even with machine learning.
Simulations demonstrate good operating characteristics of the proposed method.
The approach was illustrated using data from an HIV-1 antibody study.

Conclusions:

The proposed framework offers a robust method for assessing intrinsic feature value.
This approach overcomes limitations of algorithm-dependent importance measures.
The method facilitates reliable variable importance inference in diverse applications, including biomedical research.