Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Multiple Regression

Multiple Regression

Multiple regression assesses a linear relationship between one response or dependent variable and two or more independent variables. It has many practical applications.
Farmers can use multiple regression to determine the crop yield based on more than one factor, such as water availability, fertilizer, soil properties, etc. Here, the crop yield is the response or dependent variable as it depends on the other independent variables. The analysis requires the construction of a scatter plot...

Survival Tree

Survival Tree

Survival trees are a non-parametric method used in survival analysis to model the relationship between a set of covariates and the time until an event of interest occurs, often referred to as the "time-to-event" or "survival time." This method is particularly useful when dealing with censored data, where the event has not occurred for some individuals by the end of the study period, or when the exact time of the event is unknown.
Building a Survival Tree
Constructing a...

Variability: Analysis

Variability: Analysis

Measures of variability are statistical metrics that reveal the dispersion pattern within a dataset. They are pivotal in biostatistics, providing insights into the heterogeneity within health and biological data. Variability signifies the degree to which data points diverge from one another, helping researchers understand the potential range of values and associated uncertainty within the data.
The range is a simple measure of variability, indicating the difference between the highest and...

Random Variables

Random Variables

A random variable is a single numerical value that indicates the outcome of a procedure. The concept of random variables is fundamental to the probability theory and was introduced by a Russian mathematician, Pafnuty Chebyshev, in the mid-nineteenth century.
Uppercase letters such as X or Y denote a random variable. Lowercase letters like x or y denote the value of a random variable. If X is a random variable, then X is written in words, and x is given as a number.
For example, let X = the...

Randomized Experiments

Randomized Experiments

The randomization process involves assigning study participants randomly to experimental or control groups based on their probability of being equally assigned. Randomization is meant to eliminate selection bias and balance known and unknown confounding factors so that the control group is similar to the treatment group as much as possible. A computer program and a random number generator can be used to assign participants to groups in a way that minimizes bias.
Simple randomization
Simple...

Regression Analysis

Regression Analysis

Regression analysis is a statistical tool that describes a mathematical relationship between a dependent variable and one or more independent variables.
In regression analysis, a regression equation is determined based on the line of best fit– a line that best fits the data points plotted in a graph. This line is also called the regression line. The algebraic equation for the regression line is called the regression equation. It is represented as:

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

A Data-Driven Approach for Studying the Influence of Carbides on Work Hardening of Steel.

Materials (Basel, Switzerland)·2022

Same journal

Elastic functional Cox regression model with shape predictors.

Journal of applied statistics·2026

Same journal

An improved two-stage binary relevance method for multilabel classification.

Journal of applied statistics·2026

Same journal

Classification of multivariate functional data with an application to ADHD fMRI data.

Journal of applied statistics·2026

Same journal

Assessing the performance of longitudinal T-lymphocytes as biomarkers of immune recovery in HIV-infected children with or without TB co-infection.

Journal of applied statistics·2026

Same journal

Sparse long-only Markowitz portfolio optimization.

Journal of applied statistics·2026

Same journal

Homogeneity of multinomial populations when data are classified into a large number of groups.

Journal of applied statistics·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Jul 16, 2025

Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach

Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach

Published on: July 3, 2020

Forward variable selection for random forest models.

Jasper Velthoen¹, Juan-Juan Cai², Geurt Jongbloed¹

¹Department of Applied Mathematics, Delft University of Technology, Delft, The Netherlands.

Journal of Applied Statistics

|September 18, 2023

Summary

This summary is machine-generated.

This study introduces a new interpretable prediction method using forward variable selection and the continuous ranked probability score (CRPS). The approach significantly reduces false positives in variable selection for high-dimensional data.

Keywords:

CRPS Random forests correlated covariates forward selection variable selection

More Related Videos

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

Author Spotlight: Impact of Intergenic Interactions on Disease-Identifying Dark Biomarkers

Author Spotlight: Impact of Intergenic Interactions on Disease-Identifying Dark Biomarkers

Published on: March 1, 2024

Related Experiment Videos

Last Updated: Jul 16, 2025

Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach

Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach

Published on: July 3, 2020

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

Author Spotlight: Impact of Intergenic Interactions on Disease-Identifying Dark Biomarkers

Author Spotlight: Impact of Intergenic Interactions on Disease-Identifying Dark Biomarkers

Published on: March 1, 2024

Area of Science:

Statistics
Machine Learning
Environmental Science

Background:

Random forests are effective for high-dimensional data but lack interpretability.
Interpretable predictive models are crucial for understanding complex relationships.

Purpose of the Study:

To develop a forward variable selection method for interpretable predictive modeling.
To minimize the continuous ranked probability score (CRPS) for optimal variable selection.
To provide a statistically rigorous method for selecting relevant covariates.

Main Methods:

A stepwise forward selection procedure minimizing CRPS.
A stopping criterion based on CRPS risk difference estimation.
Mathematical proof of optimality in a population sense.
Simulation studies comparing performance with existing methods.

Main Results:

The proposed method achieves a lower false positive rate compared to existing techniques.
Demonstrated effectiveness in statistical post-processing of temperature forecasts.
Selected approximately 10% of covariates while maintaining predictive power.

Conclusions:

The developed method offers an interpretable alternative for high-dimensional prediction.
It provides a robust approach for variable selection in statistical modeling.
The method is applicable to real-world forecasting problems, enhancing model transparency.