Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Statistical Inference Techniques in Hypothesis Testing: Parametric Versus Nonparametric Data

Statistical Inference Techniques in Hypothesis Testing: Parametric Versus Nonparametric Data

Statistical inference techniques, paramount in hypothesis testing, differentiate into two broad categories: parametric and nonparametric statistics.
Parametric statistics, as the name suggests, assumes that data follow a specific distribution, often a normal distribution. This assumption enables robust hypothesis testing and estimation. Parametric methods, like the Student's t-test or Goodness-of-fit test, are frequently employed in biostatistics due to their robustness. For instance,...

Introduction to Nonparametric Statistics

Introduction to Nonparametric Statistics

Nonparametric statistics offer a powerful alternative to traditional parametric methods, useful when assumptions about the population distribution cannot be made. Unlike parametric tests, which require data to follow a specific distribution with well-defined parameters (such as the mean and standard deviation), nonparametric tests do not require such constraints. This makes them particularly valuable when dealing with small sample sizes, skewed data, or ordinal and categorical variables.
One of...

Distributions to Estimate Population Parameter

Distributions to Estimate Population Parameter

The accurate values of population parameters such as population proportion, population mean, and population standard deviation (or variance) are usually unknown. These are fixed values that can only be estimated from the data collected from the samples. The estimates of each of these parameters are sample proportion, the sample mean, and sample standard deviation (or variance). To obtain the values of these sample statistics, data are required that have particular distribution and central...

Significance Testing: Overview

Significance Testing: Overview

Significance testing is a set of statistical methods used to test whether a claim about a parameter is valid. In analytical chemistry, significance testing is used primarily to determine whether the difference between two values comes from determinate or random errors. The effect of a particular change in the measurement protocol, analyst, or sample itself can cause a deviation from the expected result. In the case of a suspected deviation/outlier, we need to be able to confirm mathematically...

Ranks

Ranks

Unlike parametric methods, nonparametric statistics are ideal for nominal and ordinal data, requiring fewer assumptions about the population's nature or distribution. This makes nonparametric methods easier to apply and interpret, as they do not depend on parameters like mean or standard deviation. One common approach in nonparametric analysis is to sort data according to a specific criterion. For instance, we might arrange weather data from hottest to coldest days in a month or rank cities...

Outliers and Influential Points

Outliers and Influential Points

An outlier is an observation of data that does not fit the rest of the data. It is sometimes called an extreme value. When you graph an outlier, it will appear not to fit the pattern of the graph. Some outliers are due to mistakes (for example, writing down 50 instead of 500), while others may indicate that something unusual is happening. Outliers are present far from the least squares line in the vertical direction. They have large "errors," where the "error" or residual is the...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Inference on summaries of a model-agnostic longitudinal variable importance trajectory with application to suicide prevention.

The annals of applied statistics·2026

Same author

Identifying anaphylaxis using weakly-supervised prediction models and natural language processing.

medRxiv : the preprint server for health sciences·2026

Same author

Efficacy of codesigned COVID-19 booster vaccine promotion materials for long-term care staff: a cluster-randomized trial.

BMC public health·2026

Same author

Substitution Patterns After Discontinuation of CNS-Active Medications in Older Adults in Primary Care.

Journal of the American Geriatrics Society·2026

Same author

Human-AI co-design for clinical prediction models.

NPJ digital medicine·2026

Same author

Clinical trials for continuously monitored and updated AI systems.

Nature medicine·2026

Same journal

Towards the Efficient Inference by Incorporating Automated Computational Phenotypes under Covariate Shift.

Proceedings of machine learning research·2026

Same journal

Endo-SemiS: Towards Robust Semi-Supervised Image Segmentation for Endoscopic Video.

Proceedings of machine learning research·2026

Same journal

Perspective: Machine Learning for Health Should Consider Social Drivers of Health.

Proceedings of machine learning research·2026

Same journal

Classifying Phonotrauma Severity from Vocal Fold Images with Soft Ordinal Regression.

Proceedings of machine learning research·2026

Same journal

Does Domain-Specific Retrieval Augmented Generation Help LLMs Answer Consumer Health Questions?

Proceedings of machine learning research·2026

Same journal

Quantitative Convergence Analysis of Projected Stochastic Gradient Descent for Non-Convex Losses via the Goldstein Subdifferential.

Proceedings of machine learning research·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Nov 8, 2025

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

Efficient nonparametric statistical inference on population feature importance using Shapley values.

Brian D Williamson¹, Jean Feng²

¹Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, Seattle, WA.

Proceedings of Machine Learning Research

|April 22, 2021

Summary

This summary is machine-generated.

We introduce a computationally efficient Shapley Population Variable Importance Measure (SPVIM) estimator. This method provides valid statistical inference for variable importance, crucial for understanding data and guiding future experiments.

More Related Videos

Author Spotlight: IntelliSleepScorer — A High-Accuracy, Accessible GUI Software for Automated Sleep Stage Scoring in Mice and its Application in Psychiatric Research

Author Spotlight: IntelliSleepScorer — A High-Accuracy, Accessible GUI Software for Automated Sleep Stage Scoring in Mice and its Application in Psychiatric Research

Published on: November 8, 2024

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

Published on: March 1, 2022

Related Experiment Videos

Last Updated: Nov 8, 2025

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

Author Spotlight: IntelliSleepScorer — A High-Accuracy, Accessible GUI Software for Automated Sleep Stage Scoring in Mice and its Application in Psychiatric Research

Author Spotlight: IntelliSleepScorer — A High-Accuracy, Accessible GUI Software for Automated Sleep Stage Scoring in Mice and its Application in Psychiatric Research

Published on: November 8, 2024

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

Published on: March 1, 2022

Area of Science:

Statistics
Machine Learning
Computational Statistics

Background:

Understanding population-level variable importance is vital for data analysis and experimental design.
Accurate statistical inference on variable importance aids in interpreting data-generating mechanisms.
Existing methods for Shapley Population Variable Importance Measure (SPVIM) face computational challenges due to exponential complexity.

Purpose of the Study:

To develop a computationally efficient procedure for estimating and performing statistical inference on SPVIM.
To address the computational limitations of exact SPVIM calculations.
To provide a method for valid confidence intervals and hypothesis tests for population-level variable importance.

Main Methods:

Proposed a novel estimator for SPVIM based on random sampling of feature subsets.
The estimator samples Θ(n) feature subsets for n observations, significantly reducing computational load.
Derived the asymptotic distribution of the proposed estimator to enable statistical inference.

Main Results:

The proposed estimator converges at an asymptotically optimal rate.
Valid confidence intervals and hypothesis tests for SPVIM were constructed using the derived asymptotic distribution.
Simulations demonstrated good finite-sample performance.
Application to in-hospital mortality prediction showed consistent variable importance estimates across different machine learning algorithms.

Conclusions:

The developed procedure offers a computationally efficient and statistically valid approach to estimate population-level variable importance.
This method facilitates better understanding of data-generating mechanisms and informs measurement selection in future studies.
The SPVIM estimator is robust and performs well in practical applications, including medical predictions.