Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Survival Tree01:19

Survival Tree

251
Survival trees are a non-parametric method used in survival analysis to model the relationship between a set of covariates and the time until an event of interest occurs, often referred to as the "time-to-event" or "survival time." This method is particularly useful when dealing with censored data, where the event has not occurred for some individuals by the end of the study period, or when the exact time of the event is unknown.
 Building a Survival Tree
Constructing a...
251
Multiple Regression01:25

Multiple Regression

3.5K
Multiple regression assesses a linear relationship between one response or dependent variable and two or more independent variables. It has many practical applications.
Farmers can use multiple regression to determine the crop yield based on more than one factor, such as water availability, fertilizer, soil properties, etc. Here, the crop yield is the response or dependent variable as it depends on the other independent variables. The analysis requires the construction of a scatter plot...
3.5K
Comparing the Survival Analysis of Two or More Groups01:20

Comparing the Survival Analysis of Two or More Groups

415
Survival analysis is a cornerstone of medical research, used to evaluate the time until an event of interest occurs, such as death, disease recurrence, or recovery. Unlike standard statistical methods, survival analysis is particularly adept at handling censored data—instances where the event has not occurred for some participants by the end of the study or remains unobserved. To address these unique challenges, specialized techniques like the Kaplan-Meier estimator, log-rank test, and...
415
Gene-Environment Interactions01:20

Gene-Environment Interactions

871
Gene expression is a dynamic process that is significantly influenced by environmental factors. This interaction underlies the complex nature of biological development and the phenotypic differences observed among individuals, even among those with identical genetic makeups. Factors such as radiation, temperature, behavior, nutrition, and stress play pivotal roles in determining how genes are expressed. The concept of the reaction range is central to understanding this interaction. It posits...
871
Randomized Experiments01:13

Randomized Experiments

8.6K
The randomization process involves assigning study participants randomly to experimental or control groups based on their probability of being equally assigned. Randomization is meant to eliminate selection bias and balance known and unknown confounding factors so that the control group is similar to the treatment group as much as possible. A computer program and a random number generator can be used to assign participants to groups in a way that minimizes bias.
Simple randomization
Simple...
8.6K
Regression Analysis01:11

Regression Analysis

7.2K
Regression analysis is a statistical tool that describes a mathematical relationship between a dependent variable and one or more independent variables.
In regression analysis, a regression equation is determined based on the line of best fit– a line that best fits the data points plotted in a graph. This line is also called the regression line. The algebraic equation for the regression line is called the regression equation. It is represented as:
7.2K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Association Between Antibiotic Therapy and Treatment Effectiveness in Patients With Renal Cell Carcinoma Receiving Immune Checkpoint Inhibitors or Tyrosine Kinase Inhibitors.

JCO oncology practice·2026
Same author

STAT2 Promotes Tumor Growth in Colorectal Cancer Independent of Type I IFN Receptor Signaling.

Current oncology (Toronto, Ont.)·2025
Same author

Plastic hepatocyte states limit liver cancer development.

Nature communications·2025
Same author

Mechanistic insights into redox-protonation coupled fluorescence switching in peptide-mediated gold nanoclusters for precision tumor imaging and resection.

Journal of colloid and interface science·2025
Same author

Digital Skin Cancer Risk Reduction Interventions for Young Adults: Findings from a Hybrid Type-II Effectiveness-Implementation Trial.

Cancer epidemiology, biomarkers & prevention : a publication of the American Association for Cancer Research, cosponsored by the American Society of Preventive Oncology·2025
Same author

Bioinspired programmed antibiofilm strategies for accelerated wound healing via spatiotemporally controlled enzyme nanoreactors.

Journal of controlled release : official journal of the Controlled Release Society·2025
Same journal

Methods for incorporating test result information within the high-dimensional propensity score framework: application in UK electronic health record data.

BMC medical research methodology·2026
Same journal

Sparse multi-way DMDC for longitudinal classification in high dimension low sample size data.

BMC medical research methodology·2026
Same journal

Tree-based exploratory identification of predictive biomarkers in non-randomized data.

BMC medical research methodology·2026
Same journal

Comparative evaluation of interrupted time series analytical methods for healthcare quality improvement research: a Monte Carlo simulation study.

BMC medical research methodology·2026
Same journal

Methodological advances in claims-based dementia algorithms: integrating medication and clinical data for medicare populations.

BMC medical research methodology·2026
Same journal

An interpretable XGboost algorithm for predicting 30-day mortality in acute pancreatitis using routine biomarkers.

BMC medical research methodology·2026
See all related articles

Related Experiment Video

Updated: Nov 26, 2025

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment
12:18

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

7.8K

Variable selection in social-environmental data: sparse regression and tree ensemble machine learning approaches.

Elizabeth Handorf1, Yinuo Yin2, Michael Slifker3

  • 1Biostatistics and Bioinformatics Facility, Fox Chase Cancer Center, Reimann 383, 333 Cottman Ave, Philadelphia, PA, 19111, USA. Elizabeth.Handorf@fccc.edu.

BMC Medical Research Methodology
|December 11, 2020
PubMed
Summary
This summary is machine-generated.

Machine learning effectively identifies social-environmental factors linked to health outcomes using US Census data. Sparse group lasso regression excelled at finding true associations while minimizing false positives in prostate cancer research.

Keywords:
Social environmentVariable selection

More Related Videos

Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach
04:35

Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach

Published on: July 3, 2020

3.6K
Integrating Remote Sensing with Species Distribution Models; Mapping Tamarisk Invasions Using the Software for Assisted Habitat Modeling SAHM
12:26

Integrating Remote Sensing with Species Distribution Models; Mapping Tamarisk Invasions Using the Software for Assisted Habitat Modeling SAHM

Published on: October 11, 2016

13.6K

Related Experiment Videos

Last Updated: Nov 26, 2025

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment
12:18

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

7.8K
Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach
04:35

Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach

Published on: July 3, 2020

3.6K
Integrating Remote Sensing with Species Distribution Models; Mapping Tamarisk Invasions Using the Software for Assisted Habitat Modeling SAHM
12:26

Integrating Remote Sensing with Species Distribution Models; Mapping Tamarisk Invasions Using the Software for Assisted Habitat Modeling SAHM

Published on: October 11, 2016

13.6K

Area of Science:

  • Public Health
  • Data Science
  • Epidemiology

Background:

  • US Census data offers valuable insights into health disparities but is underutilized due to challenges in variable selection.
  • Researchers often manually select a limited number of variables, potentially missing crucial social-environmental factors.
  • This study addresses the need for robust methods to identify relevant variables from large datasets.

Purpose of the Study:

  • To evaluate empirical machine learning approaches for identifying social-environmental factors associated with health outcomes.
  • To compare the performance of various machine learning methods in variable selection using simulated and real-world data.
  • To apply the best-performing method to identify factors associated with advanced prostate cancer from comprehensive US Census data.

Main Methods:

  • Compared penalized regression (lasso, elastic net) and tree ensemble methods via simulation.
  • Assessed methods' ability to detect true associations and control false positives with simulated data (10 true, 1000 total variables).
  • Applied the optimal method to linked US Census (14,663 variables) and prostate cancer registry data (76,186 cases).

Main Results:

  • Elastic net identified numerous true positives; lasso controlled false positives effectively.
  • Sparse group lasso regression, combined with hierarchical clustering, demonstrated superior overall accuracy in simulations.
  • Bayesian Adaptive Regression Trees were outperformed by sparse group lasso.
  • Sparse group lasso identified a relevant subset of variables from the full dataset, with three replicating prior findings.

Conclusions:

  • Empirical machine learning can successfully pinpoint a small subset of census variables truly associated with health outcomes.
  • Sparse clustered regression models proved most effective, balancing the identification of true positives with the control of false discoveries.
  • This approach enhances the utility of large social-environmental datasets for health research.