Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Survival Tree01:19

Survival Tree

84
Survival trees are a non-parametric method used in survival analysis to model the relationship between a set of covariates and the time until an event of interest occurs, often referred to as the "time-to-event" or "survival time." This method is particularly useful when dealing with censored data, where the event has not occurred for some individuals by the end of the study period, or when the exact time of the event is unknown.
 Building a Survival Tree
Constructing a...
84
Regression Analysis01:11

Regression Analysis

5.7K
Regression analysis is a statistical tool that describes a mathematical relationship between a dependent variable and one or more independent variables.
In regression analysis, a regression equation is determined based on the line of best fit– a line that best fits the data points plotted in a graph. This line is also called the regression line. The algebraic equation for the regression line is called the regression equation. It is represented as:
5.7K
Multiple Regression01:25

Multiple Regression

3.0K
Multiple regression assesses a linear relationship between one response or dependent variable and two or more independent variables. It has many practical applications.
Farmers can use multiple regression to determine the crop yield based on more than one factor, such as water availability, fertilizer, soil properties, etc. Here, the crop yield is the response or dependent variable as it depends on the other independent variables. The analysis requires the construction of a scatter plot...
3.0K
Regression Toward the Mean01:52

Regression Toward the Mean

6.3K
Regression toward the mean (“RTM”) is a phenomenon in which extremely high or low values—for example, and individual’s blood pressure at a particular moment—appear closer to a group’s average upon remeasuring. Although this statistical peculiarity is the result of random error and chance, it has been problematic across various medical, scientific, financial and psychological applications. In particular, RTM, if not taken into account, can interfere when...
6.3K
Quantifying and Rejecting Outliers: The Grubbs Test01:02

Quantifying and Rejecting Outliers: The Grubbs Test

1.6K
Sometimes, a data set can have a recorded numerical observation that greatly  deviates from the rest of the data. Assuming that the data is normally distributed, a statistical method called the Grubbs test can be used to determine whether the observation is truly an outlier.  To perform a two-tailed Grubbs test, first, calculate the absolute difference between the outlier and the mean. Then, calculate the ratio between this difference and the standard deviation of the sample. This...
1.6K
Outliers and Influential Points01:08

Outliers and Influential Points

4.0K
An outlier is an observation of data that does not fit the rest of the data. It is sometimes called an extreme value. When you graph an outlier, it will appear not to fit the pattern of the graph. Some outliers are due to mistakes (for example, writing down 50 instead of 500), while others may indicate that something unusual is happening. Outliers are present far from the least squares line in the vertical direction. They have large "errors," where the "error" or residual is the...
4.0K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Prefrontal to ventral tegmental area dynamics drive contingency degradation.

Nature·2026
Same author

Long-read transcriptome analysis using IsoRanker for identifying pathogenic variants in Mendelian conditions.

medRxiv : the preprint server for health sciences·2025
Same author

Discussion of "Data fission: splitting a single data point".

Journal of the American Statistical Association·2025
Same author

A haplotype-resolved view of human gene regulation.

bioRxiv : the preprint server for biology·2025
Same author

Generalized data thinning using sufficient statistics.

Journal of the American Statistical Association·2025
Same author

Bioprinted platform for parallelized screening of engineered microtissues in vivo.

Cell stem cell·2025
Same journal

Classification Under Local Differential Privacy with Model Reversal and Model Averaging.

Journal of machine learning research : JMLR·2026
Same journal

Sparse Semiparametric Discriminant Analysis for High-dimensional Zero-inflated Data.

Journal of machine learning research : JMLR·2026
Same journal

Heterogeneity-aware Clustered Distributed Learning for Multi-source Data Analysis.

Journal of machine learning research : JMLR·2026
Same journal

Unsupervised Tree Boosting for Learning Probability Distributions.

Journal of machine learning research : JMLR·2026
Same journal

A Two-Stage Penalized Least Squares Method for Constructing Large Systems of Structural Equations.

Journal of machine learning research : JMLR·2026
Same journal

Bayesian Multinomial Logistic Normal Models through Marginally Latent Matrix-T Processes.

Journal of machine learning research : JMLR·2026
See all related articles

Related Experiment Video

Updated: Jul 1, 2025

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment
12:18

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

7.5K

Tree-Values: Selective Inference for Regression Trees.

Anna C Neufeld1, Lucy L Gao2, Daniela M Witten3

  • 1Department of Statistics, University of Washington, Seattle, WA 98195, USA.

Journal of Machine Learning Research : JMLR
|March 14, 2024
PubMed
Summary
This summary is machine-generated.

This study introduces a new selective inference framework for Classification and Regression Trees (CART). The methods ensure reliable statistical guarantees for inference on CART model outputs, controlling error rates and coverage.

Keywords:
CARTRegression treeshypothesis testingpost-selection inferenceselective inference

More Related Videos

Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach
04:35

Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach

Published on: July 3, 2020

3.3K
Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances
07:35

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

7.5K

Related Experiment Videos

Last Updated: Jul 1, 2025

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment
12:18

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

7.5K
Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach
04:35

Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach

Published on: July 3, 2020

3.3K
Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances
07:35

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

7.5K

Area of Science:

  • Statistics
  • Machine Learning
  • Data Mining

Background:

  • Inference on Classification and Regression Trees (CART) requires specialized methods.
  • Naive inference approaches fail to provide standard statistical guarantees like Type 1 error control.

Purpose of the Study:

  • To develop a selective inference framework for fitted CART models.
  • To ensure valid statistical inference, including error rate control and coverage, for CART outputs.

Main Methods:

  • Proposing a selective inference framework by conditioning on the data used for tree estimation.
  • Developing a test for mean response differences between terminal nodes controlling selective Type 1 error rate.
  • Creating a confidence interval for mean response within a terminal node achieving nominal selective coverage.
  • Providing efficient algorithms for computing necessary conditioning sets.

Main Results:

  • The proposed framework successfully controls the selective Type 1 error rate for hypothesis tests.
  • Confidence intervals achieve nominal selective coverage for mean responses within terminal nodes.
  • Methods were validated through simulation studies and application to a real-world dataset.

Conclusions:

  • The selective inference framework offers a statistically sound approach for analyzing CART models.
  • These methods enhance the reliability of statistical inference in the context of machine learning algorithms.
  • The approach is applicable to various datasets, including those in health and nutrition research.