Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Multiple Regression01:25

Multiple Regression

3.3K
Multiple regression assesses a linear relationship between one response or dependent variable and two or more independent variables. It has many practical applications.
Farmers can use multiple regression to determine the crop yield based on more than one factor, such as water availability, fertilizer, soil properties, etc. Here, the crop yield is the response or dependent variable as it depends on the other independent variables. The analysis requires the construction of a scatter plot...
3.3K
Randomized Experiments01:13

Randomized Experiments

8.2K
The randomization process involves assigning study participants randomly to experimental or control groups based on their probability of being equally assigned. Randomization is meant to eliminate selection bias and balance known and unknown confounding factors so that the control group is similar to the treatment group as much as possible. A computer program and a random number generator can be used to assign participants to groups in a way that minimizes bias.
Simple randomization
Simple...
8.2K
Multi-input and Multi-variable systems01:22

Multi-input and Multi-variable systems

203
Cruise control systems in cars are designed as multi-input systems to maintain a driver's desired speed while compensating for external disturbances such as changes in terrain. The block diagram for a cruise control system typically includes two main inputs: the desired speed set by the driver and any external disturbances, such as the incline of the road. By adjusting the engine throttle, the system maintains the vehicle's speed as close to the desired value as possible.
In the absence...
203
Prediction Intervals01:03

Prediction Intervals

2.5K
The interval estimate of any variable is known as the prediction interval. It helps decide if a point estimate is dependable.
However, the point estimate is most likely not the exact value of the population parameter, but close to it. After calculating point estimates, we construct interval estimates, called confidence intervals or prediction intervals. This prediction interval comprises a range of values unlike the point estimate and is a better predictor of the observed sample value, y. 
2.5K
Random Variables01:09

Random Variables

15.4K
A random variable is a single numerical value that indicates the outcome of a procedure. The concept of random variables is fundamental to the probability theory and was introduced by a Russian mathematician, Pafnuty Chebyshev, in the mid-nineteenth century.
Uppercase letters such as X or Y denote a random variable. Lowercase letters like x or y denote the value of a random variable. If X is a random variable, then X is written in words, and x is given as a number.
For example, let X = the...
15.4K
Regression Toward the Mean01:52

Regression Toward the Mean

6.6K
Regression toward the mean (“RTM”) is a phenomenon in which extremely high or low values—for example, and individual’s blood pressure at a particular moment—appear closer to a group’s average upon remeasuring. Although this statistical peculiarity is the result of random error and chance, it has been problematic across various medical, scientific, financial and psychological applications. In particular, RTM, if not taken into account, can interfere when...
6.6K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

When to Adjust for Multiple Testing: A Unifying Guiding Principle.

Biometrical journal. Biometrische Zeitschrift·2026
Same author

Methodological guidance on clinical prediction models in mental health research.

Psychological medicine·2026
Same author

Sunscreen Efficacy Against UVA1- And Visible Light-Induced Skin Pigmentation Is Influenced by Ancestry.

Photodermatology, photoimmunology & photomedicine·2026
Same author

Detecting gene-environment interactions to guide personalized intervention: Boosting distributional regression for polygenic scores.

Proceedings of the National Academy of Sciences of the United States of America·2026
Same author

The Influence of Anesthesiologist Gender and Experience on Risk Understanding and Anxiety Changes After Online Preoperative Patient Education: A Sub-Analysis of the iPREDICT Randomized Controlled Trial.

Journal of clinical medicine·2025
Same author

Perceptions, Usage, and Educational Impact of ChatGPT Among Medical Students in Germany: Cross-Sectional Mixed Methods Survey.

JMIR formative research·2025
Same journal

OpenIMC: an open-source platform for analyzing single-cell and spatial proteomics by imaging mass cytometry.

BMC bioinformatics·2026
Same journal

NAP: an open source pipeline for cross-domain microbiome profiling using Nanopore sequencing-derived amplicon data.

BMC bioinformatics·2026
Same journal

SurvGME: an R package for survival analysis with graphical and measurement error models.

BMC bioinformatics·2026
Same journal

SimMapNet: a Bayesian framework for gene regulatory network inference using gene ontology similarities as external hint.

BMC bioinformatics·2026
Same journal

Dual channel drug-drug interactions extraction based on cross attention.

BMC bioinformatics·2026
Same journal

FeSseqdb: a curated sequence-level database and interpretable machine learning framework for identifying iron-sulfur proteins.

BMC bioinformatics·2026
See all related articles

Related Experiment Video

Updated: Oct 20, 2025

Author Spotlight: Impact of Intergenic Interactions on Disease-Identifying Dark Biomarkers
03:37

Author Spotlight: Impact of Intergenic Interactions on Disease-Identifying Dark Biomarkers

Published on: March 1, 2024

993

Randomized boosting with multivariable base-learners for high-dimensional variable selection and prediction.

Christian Staerk1, Andreas Mayr2

  • 1Department of Medical Biometry, Informatics and Epidemiology, University Hospital Bonn, Venusberg-Campus 1, 53127, Bonn, Germany. christian.staerk@uni-bonn.de.

BMC Bioinformatics
|September 17, 2021
PubMed
Summary
This summary is machine-generated.

We introduce new statistical boosting methods that use multivariable base-learners to improve prediction models for complex biomedical data. These algorithms offer sparser, more interpretable results, especially with highly correlated variables.

Keywords:
BoostingFeature selectionHigh-dimensional dataInformation criteriaSparsityVariable selection

More Related Videos

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances
07:35

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

7.7K
Basics of Multivariate Analysis in Neuroimaging Data
06:35

Basics of Multivariate Analysis in Neuroimaging Data

Published on: July 24, 2010

17.0K

Related Experiment Videos

Last Updated: Oct 20, 2025

Author Spotlight: Impact of Intergenic Interactions on Disease-Identifying Dark Biomarkers
03:37

Author Spotlight: Impact of Intergenic Interactions on Disease-Identifying Dark Biomarkers

Published on: March 1, 2024

993
Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances
07:35

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

7.7K
Basics of Multivariate Analysis in Neuroimaging Data
06:35

Basics of Multivariate Analysis in Neuroimaging Data

Published on: July 24, 2010

17.0K

Area of Science:

  • Computational statistics
  • Biomedical data analysis
  • Machine learning

Background:

  • Statistical boosting traditionally uses single-variable base-learners, which can lead to overfitting and inclusion of noise variables in high-dimensional biomedical data.
  • The number of iterations in boosting is often tuned for predictive performance, potentially resulting in overly complex models.

Purpose of the Study:

  • To extend classical component-wise gradient boosting with multivariable base-learners for improved model selection and estimation.
  • To develop algorithms that automatically stop and yield sparser, more interpretable prediction models for high-dimensional biomedical data.

Main Methods:

  • Propose three extensions: Subspace Boosting (SubBoost) with multivariable base-learners and information criteria for stopping.
  • Introduce Random Subspace Boosting (RSubBoost) with random preselection for scalability.
  • Develop Adaptive Subspace Boosting (AdaSubBoost) with adaptive preselection focusing on predictive base-learners.

Main Results:

  • Subspace algorithms are beneficial for data with high correlations among covariates.
  • Proposed algorithms yield sparser models compared to classical boosting in biomedical applications.
  • Achieve competitive predictive performance against penalized regression methods like lasso and elastic net.

Conclusions:

  • Randomized boosting approaches with multivariable base-learners are effective extensions for highly-correlated, sparse, high-dimensional data.
  • Information criteria-based selection of base-learners ensures automatic stopping, promoting parsimonious and interpretable models.