Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Survival Tree

Survival Tree

Survival trees are a non-parametric method used in survival analysis to model the relationship between a set of covariates and the time until an event of interest occurs, often referred to as the "time-to-event" or "survival time." This method is particularly useful when dealing with censored data, where the event has not occurred for some individuals by the end of the study period, or when the exact time of the event is unknown.
Building a Survival Tree
Constructing a...

Woodward–Hoffmann Selection Rules and Microscopic Reversibility

Woodward–Hoffmann Selection Rules and Microscopic Reversibility

Electrocyclic reactions, cycloadditions, and sigmatropic rearrangements are concerted pericyclic reactions that proceed via a cyclic transition state. These reactions are stereospecific and regioselective. The stereochemistry of the products depends on the symmetry characteristics of the interacting orbitals and the reaction conditions. Accordingly, pericyclic reactions are classified as either symmetry-allowed or symmetry-forbidden. Woodward and Hoffmann presented the selection criteria for...

Quantifying and Rejecting Outliers: The Grubbs Test

Quantifying and Rejecting Outliers: The Grubbs Test

Sometimes, a data set can have a recorded numerical observation that greatly deviates from the rest of the data. Assuming that the data is normally distributed, a statistical method called the Grubbs test can be used to determine whether the observation is truly an outlier. To perform a two-tailed Grubbs test, first, calculate the absolute difference between the outlier and the mean. Then, calculate the ratio between this difference and the standard deviation of the sample. This...

Decision Making: P-value Method

Decision Making: P-value Method

The process of hypothesis testing based on the P-value method includes calculating the P- value using the sample data and interpreting it.
First, a specific claim about the population parameter is proposed. The claim is based on the research question and is stated in a simple form. Further, an opposing statement to the claim is also stated. These statements can act as null and alternative hypotheses: a null hypothesis would be a neutral statement while the alternative hypothesis can...

Constraints and Statical Determinacy

Constraints and Statical Determinacy

In structural engineering, the equilibrium of a system is not only determined by its equations of equilibrium but also with the help of constraints. Constraints refer to restrictions on the motion of a system. The proper combinations of constraints can minimize the total number of constraints needed to maintain a system in mechanical equilibrium. When this happens, the system is said to be statically determinate. For such systems, the unknown reaction supports can be estimated using equilibrium...

Propagation of Uncertainty from Random Error

Propagation of Uncertainty from Random Error

An experiment often consists of more than a single step. In this case, measurements at each step give rise to uncertainty. Because the measurements occur in successive steps, the uncertainty in one step necessarily contributes to that in the subsequent step. As we perform statistical analysis on these types of experiments, we must learn to account for the propagation of uncertainty from one step to the next. The propagation of uncertainty depends on the type of arithmetic operation performed on...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Restless reachability problems in temporal graphs.

Knowledge and information systems·2025

Same author

Ranking with submodular functions on a budget.

Data mining and knowledge discovery·2022

Same author

Provable randomized rounding for minimum-similarity diversification.

Data mining and knowledge discovery·2022

Same author

Strengthening ties towards a highly-connected world.

Data mining and knowledge discovery·2022

Same author

Finding Path Motifs in Large Temporal Graphs Using Algebraic Fingerprints.

Big data·2020

Same author

Automated 3D phenotype analysis using data mining.

PloS one·2008

Same journal

Topology only pre-training: towards generalised multi-domain graph models.

Data mining and knowledge discovery·2026

Same journal

Detection and evaluation of clusters within sequential data.

Data mining and knowledge discovery·2025

Same journal

Universal representation learning for multivariate time series using the instance-level and cluster-level supervised contrastive learning.

Data mining and knowledge discovery·2025

Same journal

Missing value replacement in strings and applications.

Data mining and knowledge discovery·2025

Same journal

Robust explainer recommendation for time series classification.

Data mining and knowledge discovery·2024

Same journal

Somtimes: self organizing maps for time series clustering and its application to serious illness conversations.

Data mining and knowledge discovery·2024

See all related articles

Search research articles

Related Experiment Video

Updated: Aug 14, 2025

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

Regularized impurity reduction: accurate decision trees with complexity guarantees.

Guangyi Zhang¹, Aristides Gionis¹

¹Division of Theoretical Computer Science, KTH Royal Institute of Technology, Stockholm, Sweden.

Data Mining and Knowledge Discovery

|January 9, 2023

Summary

This summary is machine-generated.

This study enhances decision tree algorithms to guarantee smaller, more interpretable models. The new approach balances accuracy and complexity, offering theoretical guarantees for tree induction.

Keywords:

Approximation algorithms Decision trees Impurity functions Submodularity Tree complexity

More Related Videos

Author Spotlight: Advancements in X-ray CT Tool Chain for Tree Core Analysis

Author Spotlight: Advancements in X-ray CT Tool Chain for Tree Core Analysis

Published on: September 22, 2023

O-cresol Concentration Online Measurement Based On Near Infrared Spectroscopy Via Partial Least Square Regression

O-cresol Concentration Online Measurement Based On Near Infrared Spectroscopy Via Partial Least Square Regression

Published on: November 8, 2019

Related Experiment Videos

Last Updated: Aug 14, 2025

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

Author Spotlight: Advancements in X-ray CT Tool Chain for Tree Core Analysis

Author Spotlight: Advancements in X-ray CT Tool Chain for Tree Core Analysis

Published on: September 22, 2023

O-cresol Concentration Online Measurement Based On Near Infrared Spectroscopy Via Partial Least Square Regression

O-cresol Concentration Online Measurement Based On Near Infrared Spectroscopy Via Partial Least Square Regression

Published on: November 8, 2019

Area of Science:

Machine Learning
Data Mining
Artificial Intelligence

Background:

Decision trees are popular classification models known for accuracy and interpretability.
Model interpretability deteriorates as tree size increases.
Traditional algorithms lack theoretical guarantees for producing small trees.

Purpose of the Study:

To provide theoretical guarantees for producing smaller decision trees.
To enhance impurity-reduction functions for better complexity control.
To develop a tree-induction algorithm with approximation guarantees on complexity.

Main Methods:

Proposed a novel tree-induction algorithm with a logarithmic approximation guarantee on tree complexity.
Utilized a general family of impurity functions, including entropy and Gini-index.
Defined a greedy criterion balancing tree balance, cost-efficiency, and discriminative power.

Main Results:

The enhanced algorithm provides a tight logarithmic approximation factor for tree complexity.
Achieved an excellent balance between predictive accuracy and tree complexity.
Demonstrated effectiveness across binary and multi-class classification with non-uniform costs.

Conclusions:

The proposed enhancement successfully equips impurity functions with complexity guarantees.
The algorithm offers a practical solution for generating interpretable and accurate decision trees.
This work contributes to the theoretical understanding and practical application of decision tree induction.