Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Survival Tree01:19

Survival Tree

167
Survival trees are a non-parametric method used in survival analysis to model the relationship between a set of covariates and the time until an event of interest occurs, often referred to as the "time-to-event" or "survival time." This method is particularly useful when dealing with censored data, where the event has not occurred for some individuals by the end of the study period, or when the exact time of the event is unknown.
 Building a Survival Tree
Constructing a...
167
Quantifying and Rejecting Outliers: The Grubbs Test01:02

Quantifying and Rejecting Outliers: The Grubbs Test

2.2K
Sometimes, a data set can have a recorded numerical observation that greatly  deviates from the rest of the data. Assuming that the data is normally distributed, a statistical method called the Grubbs test can be used to determine whether the observation is truly an outlier.  To perform a two-tailed Grubbs test, first, calculate the absolute difference between the outlier and the mean. Then, calculate the ratio between this difference and the standard deviation of the sample. This...
2.2K
Expected Frequencies in Goodness-of-Fit Tests01:19

Expected Frequencies in Goodness-of-Fit Tests

2.7K
A goodness-of-fit test is conducted to determine whether the observed frequency values are statistically similar to the frequencies expected for the dataset. Suppose the expected frequencies for a dataset are equal such as when predicting the frequency of any number appearing when casting a die. In that case, the expected frequency is the ratio of the total number of observations (n)  to the number of categories (k).
2.7K
Goodness-of-Fit Test01:16

Goodness-of-Fit Test

4.1K
The goodness-of-fit test is a type of hypothesis test which determines whether the data "fits" a particular distribution. For example, one may suspect that some anonymous data may fit a binomial distribution. A chi-square test (meaning the distribution for the hypothesis test is chi-square) can be used to determine if there is a fit. The null and alternative hypotheses may be written in sentences or stated as equations or inequalities. The test statistic for a goodness-of-fit test is given as...
4.1K
Multiple Regression01:25

Multiple Regression

3.2K
Multiple regression assesses a linear relationship between one response or dependent variable and two or more independent variables. It has many practical applications.
Farmers can use multiple regression to determine the crop yield based on more than one factor, such as water availability, fertilizer, soil properties, etc. Here, the crop yield is the response or dependent variable as it depends on the other independent variables. The analysis requires the construction of a scatter plot...
3.2K
Sensitivity, Specificity, and Predicted Value01:13

Sensitivity, Specificity, and Predicted Value

695
In healthcare diagnostics, laboratory tests play a crucial role in identifying and diagnosing a wide range of medical conditions. However, interpreting test results is not always straightforward. An abnormal test result does not always confirm the presence of a disease, just as a normal result does not guarantee its absence. To assess the reliability of these diagnostic tools, healthcare practitioners rely on two key statistical indicators: sensitivity and specificity.
Sensitivity is the...
695

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

A Comparative Analysis of Discrete Entropy Estimators for Large-Alphabet Problems.

Entropy (Basel, Switzerland)·2024
Same author

Detecting Non-Overlapping Signals with Dynamic Programming.

Entropy (Basel, Switzerland)·2023
Same author

Robust Universal Inference.

Entropy (Basel, Switzerland)·2021
Same author

Nonlinear Canonical Correlation Analysis:A Compressed Representation Approach.

Entropy (Basel, Switzerland)·2020
Same journal

Research on a Regional Availability Evaluation Model for Road-Area High-Entropy Energy Based on Synergy Factors.

Entropy (Basel, Switzerland)·2026
Same journal

Atmospheric Turbulence Channel Modeling and Performance Analysis of a CO-ZP-OFDM Coherent Optical Communication System for UAV Air-to-Ground Scenarios.

Entropy (Basel, Switzerland)·2026
Same journal

Information Geometry and Asymptotic Theory for SMML Estimators.

Entropy (Basel, Switzerland)·2026
Same journal

Correlation Entropy and Power-Law Kinetics.

Entropy (Basel, Switzerland)·2026
Same journal

Research on the Contagion of Systemic Financial Risk Under the Impact of Climate Risks-From the Perspective of Complex Networks and Machine Learning.

Entropy (Basel, Switzerland)·2026
Same journal

The Statistical-Mechanical Meaning of the Wave Function of Quantum Mechanics.

Entropy (Basel, Switzerland)·2026
See all related articles

Related Experiment Video

Updated: Sep 21, 2025

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances
07:35

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

7.6K

Feature Importance in Gradient Boosting Trees with Cross-Validation Feature Selection.

Afek Ilay Adler1, Amichai Painsky1

  • 1The Industrial Engineering Department, Tel Aviv University, Tel Aviv 69978, Israel.

Entropy (Basel, Switzerland)
|May 28, 2022
PubMed
Summary
This summary is machine-generated.

Gradient Boosting Machines (GBM) with biased base learners show skewed feature importance. Using cross-validated unbiased learners improves GBM feature importance without sacrificing prediction accuracy.

Keywords:
classification and regression treesfeature importancegradient boostingtree-based methods

More Related Videos

Author Spotlight: Impact of Intergenic Interactions on Disease-Identifying Dark Biomarkers
03:37

Author Spotlight: Impact of Intergenic Interactions on Disease-Identifying Dark Biomarkers

Published on: March 1, 2024

922
A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment
12:18

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

7.6K

Related Experiment Videos

Last Updated: Sep 21, 2025

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances
07:35

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

7.6K
Author Spotlight: Impact of Intergenic Interactions on Disease-Identifying Dark Biomarkers
03:37

Author Spotlight: Impact of Intergenic Interactions on Disease-Identifying Dark Biomarkers

Published on: March 1, 2024

922
A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment
12:18

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

7.6K

Area of Science:

  • Machine Learning
  • Data Science
  • Statistical Modeling

Background:

  • Gradient Boosting Machines (GBM) are widely used for tabular data prediction.
  • Standard GBM implementations often use decision trees biased towards high-cardinality categorical variables.
  • This bias has been studied for predictive performance but not extensively for feature importance.

Purpose of the Study:

  • To investigate the impact of biased base learners on feature importance (FI) in Gradient Boosting Machines.
  • To propose and evaluate a method for mitigating this bias in GBM feature importance measures.

Main Methods:

  • Utilizing cross-validated (CV) unbiased base learners within the GBM framework.
  • Testing the proposed framework on both synthetic and real-world datasets.
  • Comparing feature importance measures and predictive performance against standard GBM implementations.

Main Results:

  • Biased base learners in GBM implementations lead to surprisingly biased feature importance, despite competitive predictive performance.
  • The proposed framework using CV unbiased base learners significantly improves all GBM feature importance measures.
  • Prediction accuracy is maintained at a comparable level to standard GBM implementations.

Conclusions:

  • The bias in base learners significantly affects GBM feature importance, not just predictive performance.
  • Employing cross-validated unbiased base learners is an effective and computationally efficient method to correct feature importance bias in GBM.
  • This approach enhances the reliability of feature importance analysis in Gradient Boosting Machines.