Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Multiple Regression01:25

Multiple Regression

3.0K
Multiple regression assesses a linear relationship between one response or dependent variable and two or more independent variables. It has many practical applications.
Farmers can use multiple regression to determine the crop yield based on more than one factor, such as water availability, fertilizer, soil properties, etc. Here, the crop yield is the response or dependent variable as it depends on the other independent variables. The analysis requires the construction of a scatter plot...
3.0K
Survival Tree01:19

Survival Tree

105
Survival trees are a non-parametric method used in survival analysis to model the relationship between a set of covariates and the time until an event of interest occurs, often referred to as the "time-to-event" or "survival time." This method is particularly useful when dealing with censored data, where the event has not occurred for some individuals by the end of the study period, or when the exact time of the event is unknown.
 Building a Survival Tree
Constructing a...
105
Variability: Analysis01:11

Variability: Analysis

158
Measures of variability are statistical metrics that reveal the dispersion pattern within a dataset. They are pivotal in biostatistics, providing insights into the heterogeneity within health and biological data. Variability signifies the degree to which data points diverge from one another, helping researchers understand the potential range of values and associated uncertainty within the data.
The range is a simple measure of variability, indicating the difference between the highest and...
158
Random Variables01:09

Random Variables

12.3K
A random variable is a single numerical value that indicates the outcome of a procedure. The concept of random variables is fundamental to the probability theory and was introduced by a Russian mathematician, Pafnuty Chebyshev, in the mid-nineteenth century.
Uppercase letters such as X or Y denote a random variable. Lowercase letters like x or y denote the value of a random variable. If X is a random variable, then X is written in words, and x is given as a number.
For example, let X = the...
12.3K
Randomized Experiments01:13

Randomized Experiments

7.0K
The randomization process involves assigning study participants randomly to experimental or control groups based on their probability of being equally assigned. Randomization is meant to eliminate selection bias and balance known and unknown confounding factors so that the control group is similar to the treatment group as much as possible. A computer program and a random number generator can be used to assign participants to groups in a way that minimizes bias.
Simple randomization
Simple...
7.0K
Regression Analysis01:11

Regression Analysis

5.8K
Regression analysis is a statistical tool that describes a mathematical relationship between a dependent variable and one or more independent variables.
In regression analysis, a regression equation is determined based on the line of best fit– a line that best fits the data points plotted in a graph. This line is also called the regression line. The algebraic equation for the regression line is called the regression equation. It is represented as:
5.8K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

A Data-Driven Approach for Studying the Influence of Carbides on Work Hardening of Steel.

Materials (Basel, Switzerland)·2022
Same journal

Elastic functional Cox regression model with shape predictors.

Journal of applied statistics·2026
Same journal

An improved two-stage binary relevance method for multilabel classification.

Journal of applied statistics·2026
Same journal

Classification of multivariate functional data with an application to ADHD fMRI data.

Journal of applied statistics·2026
Same journal

Assessing the performance of longitudinal T-lymphocytes as biomarkers of immune recovery in HIV-infected children with or without TB co-infection.

Journal of applied statistics·2026
Same journal

Sparse long-only Markowitz portfolio optimization.

Journal of applied statistics·2026
Same journal

Homogeneity of multinomial populations when data are classified into a large number of groups.

Journal of applied statistics·2026
See all related articles

Related Experiment Video

Updated: Jul 16, 2025

Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach
04:35

Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach

Published on: July 3, 2020

3.4K

Forward variable selection for random forest models.

Jasper Velthoen1, Juan-Juan Cai2, Geurt Jongbloed1

  • 1Department of Applied Mathematics, Delft University of Technology, Delft, The Netherlands.

Journal of Applied Statistics
|September 18, 2023
PubMed
Summary
This summary is machine-generated.

This study introduces a new interpretable prediction method using forward variable selection and the continuous ranked probability score (CRPS). The approach significantly reduces false positives in variable selection for high-dimensional data.

Keywords:
CRPSRandom forestscorrelated covariatesforward selectionvariable selection

More Related Videos

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances
07:35

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

7.5K
Author Spotlight: Impact of Intergenic Interactions on Disease-Identifying Dark Biomarkers
03:37

Author Spotlight: Impact of Intergenic Interactions on Disease-Identifying Dark Biomarkers

Published on: March 1, 2024

798

Related Experiment Videos

Last Updated: Jul 16, 2025

Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach
04:35

Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach

Published on: July 3, 2020

3.4K
Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances
07:35

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

7.5K
Author Spotlight: Impact of Intergenic Interactions on Disease-Identifying Dark Biomarkers
03:37

Author Spotlight: Impact of Intergenic Interactions on Disease-Identifying Dark Biomarkers

Published on: March 1, 2024

798

Area of Science:

  • Statistics
  • Machine Learning
  • Environmental Science

Background:

  • Random forests are effective for high-dimensional data but lack interpretability.
  • Interpretable predictive models are crucial for understanding complex relationships.

Purpose of the Study:

  • To develop a forward variable selection method for interpretable predictive modeling.
  • To minimize the continuous ranked probability score (CRPS) for optimal variable selection.
  • To provide a statistically rigorous method for selecting relevant covariates.

Main Methods:

  • A stepwise forward selection procedure minimizing CRPS.
  • A stopping criterion based on CRPS risk difference estimation.
  • Mathematical proof of optimality in a population sense.
  • Simulation studies comparing performance with existing methods.

Main Results:

  • The proposed method achieves a lower false positive rate compared to existing techniques.
  • Demonstrated effectiveness in statistical post-processing of temperature forecasts.
  • Selected approximately 10% of covariates while maintaining predictive power.

Conclusions:

  • The developed method offers an interpretable alternative for high-dimensional prediction.
  • It provides a robust approach for variable selection in statistical modeling.
  • The method is applicable to real-world forecasting problems, enhancing model transparency.