Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Survival Tree01:19

Survival Tree

181
Survival trees are a non-parametric method used in survival analysis to model the relationship between a set of covariates and the time until an event of interest occurs, often referred to as the "time-to-event" or "survival time." This method is particularly useful when dealing with censored data, where the event has not occurred for some individuals by the end of the study period, or when the exact time of the event is unknown.
 Building a Survival Tree
Constructing a...
181
Assumptions of Survival Analysis01:15

Assumptions of Survival Analysis

218
Survival models analyze the time until one or more events occur, such as death in biological organisms or failure in mechanical systems. These models are widely used across fields like medicine, biology, engineering, and public health to study time-to-event phenomena. To ensure accurate results, survival analysis relies on key assumptions and careful study design.
218
Comparing the Survival Analysis of Two or More Groups01:20

Comparing the Survival Analysis of Two or More Groups

343
Survival analysis is a cornerstone of medical research, used to evaluate the time until an event of interest occurs, such as death, disease recurrence, or recovery. Unlike standard statistical methods, survival analysis is particularly adept at handling censored data—instances where the event has not occurred for some participants by the end of the study or remains unobserved. To address these unique challenges, specialized techniques like the Kaplan-Meier estimator, log-rank test, and...
343
Multiple Regression01:25

Multiple Regression

3.3K
Multiple regression assesses a linear relationship between one response or dependent variable and two or more independent variables. It has many practical applications.
Farmers can use multiple regression to determine the crop yield based on more than one factor, such as water availability, fertilizer, soil properties, etc. Here, the crop yield is the response or dependent variable as it depends on the other independent variables. The analysis requires the construction of a scatter plot...
3.3K
Truncation in Survival Analysis01:09

Truncation in Survival Analysis

343
Truncation in survival analysis refers to the exclusion of individuals or events from the dataset based on specific criteria related to the time of the event. This exclusion can happen in two primary forms: left truncation and right truncation.
Left truncation occurs when individuals who experienced the event of interest before a certain time are not included in the study. This is often due to a "delayed entry" into the study where only those who survive until a certain entry point are...
343
Introduction To Survival Analysis01:18

Introduction To Survival Analysis

432
Survival analysis is a statistical method used to study time-to-event data, where the "event" might represent outcomes like death, disease relapse, system failure, or recovery. A unique feature of survival data is censoring, which occurs when the event of interest has not been observed for some individuals during the study period. This requires specialized techniques to handle incomplete data effectively.
The primary goal of survival analysis is to estimate survival time—the time...
432

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

A Survey on 3D Gaussian Splatting Applications: Segmentation, Editing, and Generation.

IEEE transactions on pattern analysis and machine intelligence·2026
Same author

MHC-I diversity enables rapid adaptation during a viral pandemic in wild rabbit populations.

Proceedings of the National Academy of Sciences of the United States of America·2026
Same author

Deep Learning Enables Automated Segmentation and Quantification of Ultrastructure from Transmission Electron Microscopy Images.

bioRxiv : the preprint server for biology·2026
Same author

Estimating the causal effects of multiple intermittent treatments with application to COVID-19.

Journal of the Royal Statistical Society. Series C, Applied statistics·2026
Same author

The Potential Habitat of <i>Liparis campylostalix</i> (Orchidaceae) in China Under Climate Change Scenario Predicted by MaxEnt Model.

Ecology and evolution·2026
Same author

Bayesian Sensitivity Analysis for Causal Estimation With Time-Varying Unmeasured Confounding.

Statistics in medicine·2026
Same journal

Regression analysis of misclassified current status data with potentially unknown test accuracy.

Statistical methods in medical research·2026
Same journal

Bayesian multivariate linear mixed-effects models with varied association structures.

Statistical methods in medical research·2026
Same journal

Inference about the ratio of age-standardized rates between two overlapping populations.

Statistical methods in medical research·2026
Same journal

A robust neural network with random effects for subject-specific prediction of clustered count data.

Statistical methods in medical research·2026
Same journal

A comparison of methods for designing hybrid type 2 cluster-randomized trials with continuous effectiveness and implementation endpoints.

Statistical methods in medical research·2026
Same journal

Joint analysis of longitudinal and recurrent event data: A functional regression approach with autoregressive frailty.

Statistical methods in medical research·2026
See all related articles

Related Experiment Video

Updated: Oct 15, 2025

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment
12:18

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

7.7K

Variable selection with missing data in both covariates and outcomes: Imputation and machine learning.

Liangyuan Hu1, Jung-Yi Joyce Lin2, Jiayi Ji2

  • 1Department of Biostatistics and Epidemiology, 242612Rutgers University School of Public Health, USA.

Statistical Methods in Medical Research
|October 26, 2021
PubMed
Summary
This summary is machine-generated.

This study introduces a machine learning approach for variable selection with missing data. Extreme gradient boosting and Bayesian additive regression trees showed superior performance compared to traditional methods.

Keywords:
Missing at randombootstrap imputation,variable selectiontree ensemblevariable importance

More Related Videos

Constructing and Visualizing Models using Mime-based Machine-learning Framework
06:19

Constructing and Visualizing Models using Mime-based Machine-learning Framework

Published on: July 22, 2025

1.1K
Predicting Treatment Response to Image-Guided Therapies Using Machine Learning: An Example for Trans-Arterial Treatment of Hepatocellular Carcinoma
04:09

Predicting Treatment Response to Image-Guided Therapies Using Machine Learning: An Example for Trans-Arterial Treatment of Hepatocellular Carcinoma

Published on: October 10, 2018

8.4K

Related Experiment Videos

Last Updated: Oct 15, 2025

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment
12:18

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

7.7K
Constructing and Visualizing Models using Mime-based Machine-learning Framework
06:19

Constructing and Visualizing Models using Mime-based Machine-learning Framework

Published on: July 22, 2025

1.1K
Predicting Treatment Response to Image-Guided Therapies Using Machine Learning: An Example for Trans-Arterial Treatment of Hepatocellular Carcinoma
04:09

Predicting Treatment Response to Image-Guided Therapies Using Machine Learning: An Example for Trans-Arterial Treatment of Hepatocellular Carcinoma

Published on: October 10, 2018

8.4K

Area of Science:

  • Statistics
  • Machine Learning
  • Data Science

Background:

  • Variable selection is crucial in statistical modeling.
  • Missing covariates and outcomes pose significant challenges.
  • Parametric models are limited by assumptions and misspecification.

Purpose of the Study:

  • To develop a general variable selection approach for data with missing covariates and outcomes.
  • To leverage flexible machine learning models and bootstrap imputation.
  • To compare the performance of various machine learning and parametric methods.

Main Methods:

  • Proposed a general variable selection approach using machine learning and bootstrap imputation.
  • Evaluated four tree-based methods: extreme gradient boosting, random forests, Bayesian additive regression trees, and conditional random forests.
  • Compared performance against two parametric methods: lasso and backward stepwise selection.

Main Results:

  • Extreme gradient boosting and Bayesian additive regression trees demonstrated the best variable selection performance.
  • Lasso and backward stepwise selection showed subpar performance.
  • Imputation methods did not significantly impact variable selection performance.

Conclusions:

  • Flexible machine learning methods, particularly extreme gradient boosting and Bayesian additive regression trees, are effective for variable selection with missing data.
  • The proposed approach offers a robust alternative to traditional parametric methods.
  • The findings are validated through a case study on metabolic syndrome risk factors.