Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Mismatch Repair

Mismatch Repair

Organisms are capable of detecting and fixing nucleotide mismatches that occur during DNA replication. This sophisticated process requires identifying the new strand and replacing the erroneous bases with correct nucleotides. Mismatch repair is coordinated by many proteins in both prokaryotes and eukaryotes.
The Mutator Protein Family Plays a Key Role in DNA Mismatch Repair
The human genome has more than 3 billion base pairs of DNA per cell. Prior to cell division, that vast amount of genetic...

Mismatch Repair

Mismatch Repair

Expected Frequencies in Goodness-of-Fit Tests

Expected Frequencies in Goodness-of-Fit Tests

A goodness-of-fit test is conducted to determine whether the observed frequency values are statistically similar to the frequencies expected for the dataset. Suppose the expected frequencies for a dataset are equal such as when predicting the frequency of any number appearing when casting a die. In that case, the expected frequency is the ratio of the total number of observations (n) to the number of categories (k).

Weighted Mean

Weighted Mean

While taking the arithmetic, geometric, or harmonic mean of a sample data set, equal importance is assigned to all the data points. However, all the values may not always be equally important in some data sets. An intrinsic bias might make it more important to give more weightage to specific values over others.
For example, consider the number of goals scored in the matches of a tournament. While computing the average number of goals scored in the tournament, it may be more important to...

Fisher's Exact Test

Fisher's Exact Test

Fisher's exact test is a statistical significance test widely used to analyze 2x2 contingency tables, particularly in situations where sample sizes are small. Unlike the chi-squared test, which approximates P-values and assumes minimum expected frequencies of at least five in each cell, Fisher's exact test calculates the exact probability (P-value) of observing the data or more extreme results under the null hypothesis. This feature makes it especially valuable when the assumptions of the...

Trimmed Mean

Trimmed Mean

While measuring the mean of a data set, care needs to be taken when associating the mean to its central tendency. The same goes for the arithmetic mean, the geometric mean, or the harmonic mean. This is because the presence of a single outlier data value can significantly affect the mean. That is, the mean is sensitive to fluctuations in the data set.
Although certain measures of central tendency are not sensitive to outliers, there are alternative versions of the mean that get around the...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Development and validation of a Relapsing Polychondritis disease-specific Quality of Life instrument (ERN ReCONNET RP-QoL).

Rheumatology (Oxford, England)·2026

Same author

Evaluating disease burden in German AAV patients using the AAV-PRO: associations with disease activity, physical function, depression, fatigue and quality of life.

Journal of patient-reported outcomes·2026

Same author

Identifying quality of life domains and facets affected in relapsing polychondritis: a qualitative analysis for the development of a disease-specific health-related quality of life instrument.

Orphanet journal of rare diseases·2026

Same author

Validation of the German version of the ANCA-associated vasculitis patient-reported outcome questionnaire.

Clinical and experimental rheumatology·2026

Same author

Understanding pre-training data effects in retinal foundation models using two large fundus cohorts.

Nature communications·2026

Same author

Association of G-Protein-Coupled Receptors autoantibodies with vasoregulation in Post-COVID.

PloS one·2026

Same journal

conMItion: an R package adjusting confounding factors for associations in multi-omics.

Bioinformatics (Oxford, England)·2026

Same journal

SpaMFG: a Spatial Multi-omics Integration Method based on Feature Grouping.

Bioinformatics (Oxford, England)·2026

Same journal

CSCN: Inference of Cell-Specific Causal Networks Using Single-Cell RNA-Seq Data.

Bioinformatics (Oxford, England)·2026

Same journal

Sparse CCA-Based Mediation Analysis with High-Dimensional Exposures and Mediators.

Bioinformatics (Oxford, England)·2026

Same journal

Enhancing Cross-Context Generalization in Drug Perturbation Prediction with a Multimodal Conditional Diffusion Framework.

Bioinformatics (Oxford, England)·2026

Same journal

Primer Design through Submodular Function Estimation.

Bioinformatics (Oxford, England)·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Jun 13, 2026

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

Permutation importance: a corrected feature importance measure.

André Altmann¹, Laura Toloşi, Oliver Sander

¹Department of Computational Biology and Applied Algorithmics, Max Planck Institute for Informatics, Saarbrücken, Germany. altmann@mpi-inf.mpg.de

Bioinformatics (Oxford, England)

|April 14, 2010

Summary

This summary is machine-generated.

This study introduces a bias correction method for machine learning feature importance, enhancing model interpretability. The permutation importance (PIMP) approach improves prediction accuracy by identifying significant variables.

Related Experiment Videos

Last Updated: Jun 13, 2026

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

Area of Science:

Life Sciences
Bioinformatics
Machine Learning

Background:

Interpretability is crucial in life sciences, alongside prediction accuracy.
Linear models are common for feature relevance but lack flexibility.
Complex models like Support Vector Machines and Random Forests (RF) have advanced feature relevance estimators, but RF models exhibit bias towards categorical variables with many categories.

Purpose of the Study:

To introduce a heuristic for normalizing feature importance measures to correct bias.
To improve the interpretability and prediction accuracy of machine learning models, particularly RF.

Main Methods:

Developed a heuristic for normalizing feature importance.
Employed repeated permutations of the outcome vector to estimate importance distribution in a non-informative setting.
Calculated P-values for observed importance to provide a corrected measure.

Main Results:

Non-informative predictors did not receive significant P-values in simulated data.
Informative variables were successfully recovered among non-informative ones.
P-values from permutation importance (PIMP) significantly improved variable selection and model interpretability.
An improved RF model using PIMP-selected variables showed superior prediction accuracy in real-world case studies.

Conclusions:

The PIMP method effectively corrects feature importance bias in machine learning models.
This approach enhances model interpretability by providing reliable variable significance.
The proposed RF model incorporating PIMP demonstrates improved predictive performance.