Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Survival Tree

Survival Tree

Survival trees are a non-parametric method used in survival analysis to model the relationship between a set of covariates and the time until an event of interest occurs, often referred to as the "time-to-event" or "survival time." This method is particularly useful when dealing with censored data, where the event has not occurred for some individuals by the end of the study period, or when the exact time of the event is unknown.
Building a Survival Tree
Constructing a...

Regression Analysis

Regression Analysis

Regression analysis is a statistical tool that describes a mathematical relationship between a dependent variable and one or more independent variables.
In regression analysis, a regression equation is determined based on the line of best fit– a line that best fits the data points plotted in a graph. This line is also called the regression line. The algebraic equation for the regression line is called the regression equation. It is represented as:

Multiple Regression

Multiple Regression

Multiple regression assesses a linear relationship between one response or dependent variable and two or more independent variables. It has many practical applications.
Farmers can use multiple regression to determine the crop yield based on more than one factor, such as water availability, fertilizer, soil properties, etc. Here, the crop yield is the response or dependent variable as it depends on the other independent variables. The analysis requires the construction of a scatter plot...

Regression Toward the Mean

Regression Toward the Mean

Regression toward the mean (“RTM”) is a phenomenon in which extremely high or low values—for example, and individual’s blood pressure at a particular moment—appear closer to a group’s average upon remeasuring. Although this statistical peculiarity is the result of random error and chance, it has been problematic across various medical, scientific, financial and psychological applications. In particular, RTM, if not taken into account, can interfere when...

Quantifying and Rejecting Outliers: The Grubbs Test

Quantifying and Rejecting Outliers: The Grubbs Test

Sometimes, a data set can have a recorded numerical observation that greatly deviates from the rest of the data. Assuming that the data is normally distributed, a statistical method called the Grubbs test can be used to determine whether the observation is truly an outlier. To perform a two-tailed Grubbs test, first, calculate the absolute difference between the outlier and the mean. Then, calculate the ratio between this difference and the standard deviation of the sample. This...

Outliers and Influential Points

Outliers and Influential Points

An outlier is an observation of data that does not fit the rest of the data. It is sometimes called an extreme value. When you graph an outlier, it will appear not to fit the pattern of the graph. Some outliers are due to mistakes (for example, writing down 50 instead of 500), while others may indicate that something unusual is happening. Outliers are present far from the least squares line in the vertical direction. They have large "errors," where the "error" or residual is the...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Prefrontal to ventral tegmental area dynamics drive contingency degradation.

Nature·2026

Same author

Long-read transcriptome analysis using IsoRanker for identifying pathogenic variants in Mendelian conditions.

medRxiv : the preprint server for health sciences·2025

Same author

Discussion of "Data fission: splitting a single data point".

Journal of the American Statistical Association·2025

Same author

A haplotype-resolved view of human gene regulation.

bioRxiv : the preprint server for biology·2025

Same author

Generalized data thinning using sufficient statistics.

Journal of the American Statistical Association·2025

Same author

Bioprinted platform for parallelized screening of engineered microtissues in vivo.

Cell stem cell·2025

Same journal

Classification Under Local Differential Privacy with Model Reversal and Model Averaging.

Journal of machine learning research : JMLR·2026

Same journal

Sparse Semiparametric Discriminant Analysis for High-dimensional Zero-inflated Data.

Journal of machine learning research : JMLR·2026

Same journal

Heterogeneity-aware Clustered Distributed Learning for Multi-source Data Analysis.

Journal of machine learning research : JMLR·2026

Same journal

Unsupervised Tree Boosting for Learning Probability Distributions.

Journal of machine learning research : JMLR·2026

Same journal

A Two-Stage Penalized Least Squares Method for Constructing Large Systems of Structural Equations.

Journal of machine learning research : JMLR·2026

Same journal

Bayesian Multinomial Logistic Normal Models through Marginally Latent Matrix-T Processes.

Journal of machine learning research : JMLR·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Jul 1, 2025

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

Tree-Values: Selective Inference for Regression Trees.

Anna C Neufeld¹, Lucy L Gao², Daniela M Witten³

¹Department of Statistics, University of Washington, Seattle, WA 98195, USA.

Journal of Machine Learning Research : JMLR

|March 14, 2024

Summary

This summary is machine-generated.

This study introduces a new selective inference framework for Classification and Regression Trees (CART). The methods ensure reliable statistical guarantees for inference on CART model outputs, controlling error rates and coverage.

Keywords:

CART Regression trees hypothesis testing post-selection inference selective inference

More Related Videos

Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach

Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach

Published on: July 3, 2020

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

Related Experiment Videos

Last Updated: Jul 1, 2025

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach

Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach

Published on: July 3, 2020

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

Area of Science:

Statistics
Machine Learning
Data Mining

Background:

Inference on Classification and Regression Trees (CART) requires specialized methods.
Naive inference approaches fail to provide standard statistical guarantees like Type 1 error control.

Purpose of the Study:

To develop a selective inference framework for fitted CART models.
To ensure valid statistical inference, including error rate control and coverage, for CART outputs.

Main Methods:

Proposing a selective inference framework by conditioning on the data used for tree estimation.
Developing a test for mean response differences between terminal nodes controlling selective Type 1 error rate.
Creating a confidence interval for mean response within a terminal node achieving nominal selective coverage.
Providing efficient algorithms for computing necessary conditioning sets.

Main Results:

The proposed framework successfully controls the selective Type 1 error rate for hypothesis tests.
Confidence intervals achieve nominal selective coverage for mean responses within terminal nodes.
Methods were validated through simulation studies and application to a real-world dataset.

Conclusions:

The selective inference framework offers a statistically sound approach for analyzing CART models.
These methods enhance the reliability of statistical inference in the context of machine learning algorithms.
The approach is applicable to various datasets, including those in health and nutrition research.