Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Residual Plots

Residual Plots

A residual plot is a statistical representation of data used to analyze correlation and regression results. It helps verify the requirements for drawing specific conclusions about correlation and regression. To obtain the residual plot, first, the residual for each data value is calculated, which is simply the vertical distance between the observed and the predicted value obtained from the regression equation.
When the residual values are plotted against the variable x, it is called a residual...

Significance Testing: Overview

Significance Testing: Overview

Significance testing is a set of statistical methods used to test whether a claim about a parameter is valid. In analytical chemistry, significance testing is used primarily to determine whether the difference between two values comes from determinate or random errors. The effect of a particular change in the measurement protocol, analyst, or sample itself can cause a deviation from the expected result. In the case of a suspected deviation/outlier, we need to be able to confirm mathematically...

Expected Frequencies in Goodness-of-Fit Tests

Expected Frequencies in Goodness-of-Fit Tests

A goodness-of-fit test is conducted to determine whether the observed frequency values are statistically similar to the frequencies expected for the dataset. Suppose the expected frequencies for a dataset are equal such as when predicting the frequency of any number appearing when casting a die. In that case, the expected frequency is the ratio of the total number of observations (n) to the number of categories (k).

Quantifying and Rejecting Outliers: The Grubbs Test

Quantifying and Rejecting Outliers: The Grubbs Test

Sometimes, a data set can have a recorded numerical observation that greatly deviates from the rest of the data. Assuming that the data is normally distributed, a statistical method called the Grubbs test can be used to determine whether the observation is truly an outlier. To perform a two-tailed Grubbs test, first, calculate the absolute difference between the outlier and the mean. Then, calculate the ratio between this difference and the standard deviation of the sample. This...

Residuals and Least-Squares Property

Residuals and Least-Squares Property

The vertical distance between the actual value of y and the estimated value of y. In other words, it measures the vertical distance between the actual data point and the predicted point on the line
If the observed data point lies above the line, the residual is positive, and the line underestimates the actual data value for y. If the observed data point lies below the line, the residual is negative, and the line overestimates the actual data value for y.
The process of fitting the best-fit...

Fisher's Exact Test

Fisher's Exact Test

Fisher's exact test is a statistical significance test widely used to analyze 2x2 contingency tables, particularly in situations where sample sizes are small. Unlike the chi-squared test, which approximates P-values and assumes minimum expected frequencies of at least five in each cell, Fisher's exact test calculates the exact probability (P-value) of observing the data or more extreme results under the null hypothesis. This feature makes it especially valuable when the assumptions of...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Accelerating item factor analysis on GPU with Python package xifa.

Behavior research methods·2023

Same author

Condomless Anal Sex Associated With Heterogeneous Profiles Of HIV Pre-Exposure Prophylaxis Use and Sexual Activities Among Men Who Have Sex With Men: A Latent Class Analysis Using Sex Diary Data on a Mobile App.

Journal of medical Internet research·2021

Same author

Mobile App (UPrEPU) to Monitor Adherence to Pre-exposure Prophylaxis in Men Who Have Sex With Men: Protocol for a User-Centered Approach to Mobile App Design and Development.

JMIR research protocols·2020

Same journal

Proficiency order invariance of MLE, MAP, EAP, and WLE in item response theory.

The British journal of mathematical and statistical psychology·2026

Same journal

Bias and precision in true-score estimation.

The British journal of mathematical and statistical psychology·2026

Same journal

Polychoric correlations under the assumption of elliptical latent traits.

The British journal of mathematical and statistical psychology·2026

Same journal

Regularized reduced rank regression for mixed predictor and response variables.

The British journal of mathematical and statistical psychology·2026

Same journal

A multiple-choice SDT model for cognitive diagnosis models.

The British journal of mathematical and statistical psychology·2026

Same journal

Modular item response and structural equation modelling via measurement and uncertainty preserving parametric modelling.

The British journal of mathematical and statistical psychology·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Sep 9, 2025

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

Residual permutation tests for feature importance in machine learning.

Po-Hsien Huang¹

¹National Chengchi University, Taipei City, Taiwan.

The British Journal of Mathematical and Statistical Psychology

|August 30, 2025

Summary

This summary is machine-generated.

This study introduces residual permutation tests (RPTs) for machine learning (ML) hypothesis testing. RPT-X effectively assesses feature significance, maintaining statistical accuracy across various ML models.

Keywords:

feature importance machine learning permutation test

More Related Videos

Author Spotlight: Impact of Intergenic Interactions on Disease-Identifying Dark Biomarkers

Author Spotlight: Impact of Intergenic Interactions on Disease-Identifying Dark Biomarkers

Published on: March 1, 2024

Machine Learning Algorithms for Early Detection of Bone Metastases in an Experimental Rat Model

Machine Learning Algorithms for Early Detection of Bone Metastases in an Experimental Rat Model

Published on: August 16, 2020

Related Experiment Videos

Last Updated: Sep 9, 2025

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

Author Spotlight: Impact of Intergenic Interactions on Disease-Identifying Dark Biomarkers

Author Spotlight: Impact of Intergenic Interactions on Disease-Identifying Dark Biomarkers

Published on: March 1, 2024

Machine Learning Algorithms for Early Detection of Bone Metastases in an Experimental Rat Model

Machine Learning Algorithms for Early Detection of Bone Metastases in an Experimental Rat Model

Published on: August 16, 2020

Area of Science:

Psychology
Computer Science
Statistics

Background:

Traditional psychological research heavily utilizes linear models for hypothesis testing.
Machine learning (ML) offers advanced methods for exploring complex, non-linear variable relationships.
Current feature importance tools in ML lack robust statistical inference capabilities.

Purpose of the Study:

To develop statistically sound methods for hypothesis testing within machine learning frameworks.
To introduce residual permutation tests (RPTs) as a tool for assessing feature significance in ML models.
To address the gap in inferential statistics for interpreting 'black-box' ML algorithms.

Main Methods:

Introduced two variants of residual permutation tests: RPT on Y (RPT-Y) and RPT on X (RPT-X).
RPT-Y permutes label residuals conditioned on other features.
RPT-X permutes target feature residuals conditioned on other features.
Conducted a comprehensive simulation study across diverse ML algorithms.

Main Results:

RPT-X demonstrated stable empirical Type I error rates below the nominal level.
RPT-X showed appropriate statistical power in both regression and classification tasks.
The study validated RPT-X performance across a wide range of ML algorithms.

Conclusions:

Residual permutation tests, particularly RPT-X, provide a valid approach for statistical inference in ML.
RPT-X is a valuable tool for hypothesis testing, enhancing the interpretability of ML models.
The findings support the broader adoption of RPT-X in psychological research and other ML applications.