Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Statistical Inference Techniques in Hypothesis Testing: Parametric Versus Nonparametric Data01:16

Statistical Inference Techniques in Hypothesis Testing: Parametric Versus Nonparametric Data

282
Statistical inference techniques, paramount in hypothesis testing, differentiate into two broad categories: parametric and nonparametric statistics.
Parametric statistics, as the name suggests, assumes that data follow a specific distribution, often a normal distribution. This assumption enables robust hypothesis testing and estimation. Parametric methods, like the Student's t-test or Goodness-of-fit test, are frequently employed in biostatistics due to their robustness. For instance,...
282
Introduction to Nonparametric Statistics01:28

Introduction to Nonparametric Statistics

1.0K
Nonparametric statistics offer a powerful alternative to traditional parametric methods, useful when assumptions about the population distribution cannot be made. Unlike parametric tests, which require data to follow a specific distribution with well-defined parameters (such as the mean and standard deviation), nonparametric tests do not require such constraints. This makes them particularly valuable when dealing with small sample sizes, skewed data, or ordinal and categorical variables.
One of...
1.0K
Distributions to Estimate Population Parameter01:26

Distributions to Estimate Population Parameter

4.6K
The accurate values of population parameters such as population proportion, population mean, and population standard deviation (or variance) are usually unknown. These are fixed values that can only be estimated from the data collected from the samples. The estimates of each of these parameters are sample proportion, the sample mean, and sample standard deviation (or variance). To obtain the values of these sample statistics, data are required that have particular distribution and central...
4.6K
Significance Testing: Overview01:04

Significance Testing: Overview

8.3K
Significance testing is a set of statistical methods used to test whether a claim about a parameter is valid. In analytical chemistry, significance testing is used primarily to determine whether the difference between two values comes from determinate or random errors. The effect of a particular change in the measurement protocol, analyst, or sample itself can cause a deviation from the expected result. In the case of a suspected deviation/outlier, we need to be able to confirm mathematically...
8.3K
Ranks01:02

Ranks

324
Unlike parametric methods, nonparametric statistics are ideal for nominal and ordinal data, requiring fewer assumptions about the population's nature or distribution. This makes nonparametric methods easier to apply and interpret, as they do not depend on parameters like mean or standard deviation. One common approach in nonparametric analysis is to sort data according to a specific criterion. For instance, we might arrange weather data from hottest to coldest days in a month or rank cities...
324
Outliers and Influential Points01:08

Outliers and Influential Points

5.0K
An outlier is an observation of data that does not fit the rest of the data. It is sometimes called an extreme value. When you graph an outlier, it will appear not to fit the pattern of the graph. Some outliers are due to mistakes (for example, writing down 50 instead of 500), while others may indicate that something unusual is happening. Outliers are present far from the least squares line in the vertical direction. They have large "errors," where the "error" or residual is the...
5.0K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Inference on summaries of a model-agnostic longitudinal variable importance trajectory with application to suicide prevention.

The annals of applied statistics·2026
Same author

Identifying anaphylaxis using weakly-supervised prediction models and natural language processing.

medRxiv : the preprint server for health sciences·2026
Same author

Efficacy of codesigned COVID-19 booster vaccine promotion materials for long-term care staff: a cluster-randomized trial.

BMC public health·2026
Same author

Substitution Patterns After Discontinuation of CNS-Active Medications in Older Adults in Primary Care.

Journal of the American Geriatrics Society·2026
Same author

Human-AI co-design for clinical prediction models.

NPJ digital medicine·2026
Same author

Clinical trials for continuously monitored and updated AI systems.

Nature medicine·2026
Same journal

Towards the Efficient Inference by Incorporating Automated Computational Phenotypes under Covariate Shift.

Proceedings of machine learning research·2026
Same journal

Endo-SemiS: Towards Robust Semi-Supervised Image Segmentation for Endoscopic Video.

Proceedings of machine learning research·2026
Same journal

Perspective: Machine Learning for Health Should Consider Social Drivers of Health.

Proceedings of machine learning research·2026
Same journal

Classifying Phonotrauma Severity from Vocal Fold Images with Soft Ordinal Regression.

Proceedings of machine learning research·2026
Same journal

Does Domain-Specific Retrieval Augmented Generation Help LLMs Answer Consumer Health Questions?

Proceedings of machine learning research·2026
Same journal

Quantitative Convergence Analysis of Projected Stochastic Gradient Descent for Non-Convex Losses via the Goldstein Subdifferential.

Proceedings of machine learning research·2026
See all related articles

Related Experiment Video

Updated: Nov 8, 2025

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances
07:35

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

7.8K

Efficient nonparametric statistical inference on population feature importance using Shapley values.

Brian D Williamson1, Jean Feng2

  • 1Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, Seattle, WA.

Proceedings of Machine Learning Research
|April 22, 2021
PubMed
Summary
This summary is machine-generated.

We introduce a computationally efficient Shapley Population Variable Importance Measure (SPVIM) estimator. This method provides valid statistical inference for variable importance, crucial for understanding data and guiding future experiments.

More Related Videos

Author Spotlight: IntelliSleepScorer — A High-Accuracy, Accessible GUI Software for Automated Sleep Stage Scoring in Mice and its Application in Psychiatric Research
04:54

Author Spotlight: IntelliSleepScorer — A High-Accuracy, Accessible GUI Software for Automated Sleep Stage Scoring in Mice and its Application in Psychiatric Research

Published on: November 8, 2024

792
A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments
08:12

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

Published on: March 1, 2022

2.7K

Related Experiment Videos

Last Updated: Nov 8, 2025

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances
07:35

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

7.8K
Author Spotlight: IntelliSleepScorer — A High-Accuracy, Accessible GUI Software for Automated Sleep Stage Scoring in Mice and its Application in Psychiatric Research
04:54

Author Spotlight: IntelliSleepScorer — A High-Accuracy, Accessible GUI Software for Automated Sleep Stage Scoring in Mice and its Application in Psychiatric Research

Published on: November 8, 2024

792
A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments
08:12

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

Published on: March 1, 2022

2.7K

Area of Science:

  • Statistics
  • Machine Learning
  • Computational Statistics

Background:

  • Understanding population-level variable importance is vital for data analysis and experimental design.
  • Accurate statistical inference on variable importance aids in interpreting data-generating mechanisms.
  • Existing methods for Shapley Population Variable Importance Measure (SPVIM) face computational challenges due to exponential complexity.

Purpose of the Study:

  • To develop a computationally efficient procedure for estimating and performing statistical inference on SPVIM.
  • To address the computational limitations of exact SPVIM calculations.
  • To provide a method for valid confidence intervals and hypothesis tests for population-level variable importance.

Main Methods:

  • Proposed a novel estimator for SPVIM based on random sampling of feature subsets.
  • The estimator samples Θ(n) feature subsets for n observations, significantly reducing computational load.
  • Derived the asymptotic distribution of the proposed estimator to enable statistical inference.

Main Results:

  • The proposed estimator converges at an asymptotically optimal rate.
  • Valid confidence intervals and hypothesis tests for SPVIM were constructed using the derived asymptotic distribution.
  • Simulations demonstrated good finite-sample performance.
  • Application to in-hospital mortality prediction showed consistent variable importance estimates across different machine learning algorithms.

Conclusions:

  • The developed procedure offers a computationally efficient and statistically valid approach to estimate population-level variable importance.
  • This method facilitates better understanding of data-generating mechanisms and informs measurement selection in future studies.
  • The SPVIM estimator is robust and performs well in practical applications, including medical predictions.