Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Multiple Regression

Multiple Regression

Multiple regression assesses a linear relationship between one response or dependent variable and two or more independent variables. It has many practical applications.
Farmers can use multiple regression to determine the crop yield based on more than one factor, such as water availability, fertilizer, soil properties, etc. Here, the crop yield is the response or dependent variable as it depends on the other independent variables. The analysis requires the construction of a scatter plot...

Randomized Experiments

Randomized Experiments

The randomization process involves assigning study participants randomly to experimental or control groups based on their probability of being equally assigned. Randomization is meant to eliminate selection bias and balance known and unknown confounding factors so that the control group is similar to the treatment group as much as possible. A computer program and a random number generator can be used to assign participants to groups in a way that minimizes bias.
Simple randomization
Simple...

Regression Analysis

Regression Analysis

Regression analysis is a statistical tool that describes a mathematical relationship between a dependent variable and one or more independent variables.
In regression analysis, a regression equation is determined based on the line of best fit– a line that best fits the data points plotted in a graph. This line is also called the regression line. The algebraic equation for the regression line is called the regression equation. It is represented as:

Introduction To Survival Analysis

Introduction To Survival Analysis

Survival analysis is a statistical method used to study time-to-event data, where the "event" might represent outcomes like death, disease relapse, system failure, or recovery. A unique feature of survival data is censoring, which occurs when the event of interest has not been observed for some individuals during the study period. This requires specialized techniques to handle incomplete data effectively.
The primary goal of survival analysis is to estimate survival time—the time...

Goodness-of-Fit Test

Goodness-of-Fit Test

The goodness-of-fit test is a type of hypothesis test which determines whether the data "fits" a particular distribution. For example, one may suspect that some anonymous data may fit a binomial distribution. A chi-square test (meaning the distribution for the hypothesis test is chi-square) can be used to determine if there is a fit. The null and alternative hypotheses may be written in sentences or stated as equations or inequalities. The test statistic for a goodness-of-fit test is given as...

Parametric Survival Analysis: Weibull and Exponential Methods

Parametric Survival Analysis: Weibull and Exponential Methods

Parametric survival analysis models survival data by assuming a specific probability distribution for the time until an event occurs. The Weibull and exponential distributions are two of the most commonly used methods in this context, due to their versatility and relatively straightforward application.
Weibull Distribution
The Weibull distribution is a flexible model used in parametric survival analysis. It can handle both increasing and decreasing hazard rates, depending on its shape parameter...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

JADE: Joint Alignment and Deep Embedding for Multi-Slice Spatial Transcriptomics.

Advances in neural information processing systems·2026

Same author

Phenotypic prediction of missense variants via deep contrastive learning.

Nature biomedical engineering·2026

Same author

DEDUCE: statistical inference on disease-associated genes uncovers tissue-disease associations.

NAR genomics and bioinformatics·2026

Same author

Designing strongly coupled polaritonic structures via statistical machine learning.

Proceedings of the National Academy of Sciences of the United States of America·2025

Same author

JADE: Joint Alignment and Deep Embedding for Multi-Slice Spatial Transcriptomics.

bioRxiv : the preprint server for biology·2025

Same author

Participation bias in the estimation of heritability and genetic correlation.

Proceedings of the National Academy of Sciences of the United States of America·2025

Same journal

Instrumental Variable Estimation of Marginal Structural Mean Models for Time-Varying Treatment.

Journal of the American Statistical Association·2026

Same journal

Semiparametric Joint Modeling for Survival Analysis with Longitudinal Covariates.

Journal of the American Statistical Association·2026

Same journal

Dimension Reduction for Large-Scale Federated Data: Statistical Rate and Asymptotic Inference.

Journal of the American Statistical Association·2026

Same journal

Facilitating Heterogeneous Effect Estimation via Statistically Efficient Categorical Modifiers.

Journal of the American Statistical Association·2026

Same journal

Nonparametric Density Estimation of a Long-Term Trend from Repeated Semicontinuous Data.

Journal of the American Statistical Association·2026

Same journal

Functional Integrative Bayesian Analysis of High-dimensional Multiplatform Clinicogenomic Data.

Journal of the American Statistical Association·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Dec 10, 2025

Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach

Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach

Published on: July 3, 2020

Robust Variable and Interaction Selection for Logistic Regression and General Index Models.

Yang Li¹, Jun S Liu¹

¹Yang Li is Sr. Market Scientist, Vatic Labs LLC, New York, NY 10036. Jun S Liu is Professor, Department of Statistics, Harvard University, Cambridge, MA 02138; and is also co- Director for the Center for Statistical Science, Department of Industrial Engineering, Tsinghua University, Beijing, China.

Journal of the American Statistical Association

|September 1, 2020

Summary

This summary is machine-generated.

SODA, a new variable selection method, efficiently handles high-dimensional data in logistic and general index models. It offers enhanced robustness and superior performance, especially with non-Gaussian data.

Keywords:

Classification Forward screening High-dimensional Quadriatic discriminant analysis Semi-parametric Stepwise selection

More Related Videos

The Innovation Arena: A Method for Comparing Innovative Problem-Solving Across Groups

The Innovation Arena: A Method for Comparing Innovative Problem-Solving Across Groups

Published on: May 13, 2022

Lexical Decision Task for Studying Written Word Recognition in Adults with and without Dementia or Mild Cognitive Impairment

Lexical Decision Task for Studying Written Word Recognition in Adults with and without Dementia or Mild Cognitive Impairment

Published on: June 25, 2019

Related Experiment Videos

Last Updated: Dec 10, 2025

Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach

Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach

Published on: July 3, 2020

The Innovation Arena: A Method for Comparing Innovative Problem-Solving Across Groups

The Innovation Arena: A Method for Comparing Innovative Problem-Solving Across Groups

Published on: May 13, 2022

Lexical Decision Task for Studying Written Word Recognition in Adults with and without Dementia or Mild Cognitive Impairment

Lexical Decision Task for Studying Written Word Recognition in Adults with and without Dementia or Mild Cognitive Impairment

Published on: June 25, 2019

Area of Science:

Statistics
Machine Learning
Data Science

Background:

Variable selection is crucial for building robust statistical models.
Existing methods often struggle with high-dimensional data and specific distributional assumptions.

Purpose of the Study:

To introduce SODA, a novel forward-backward variable selection method.
To extend SODA for application in general index models.
To enhance robustness and performance in high-dimensional settings.

Main Methods:

SODA employs a forward stage to add significant predictors and a backward stage to remove unimportant terms, optimizing the extended Bayesian Information Criterion (EBIC).
The method is extended for variable selection and model fitting in general index models.
Theoretical analysis confirms variable-selection consistency under high-dimensional conditions.

Main Results:

SODA effectively handles high-dimensional data where predictors exceed sample size.
It does not require joint normality assumptions, offering greater robustness than existing methods.
SODA demonstrates superior performance with non-Gaussian design matrices in simulations and real-data applications.

Conclusions:

SODA provides a robust and consistent variable selection approach for logistic and general index models.
The method is particularly advantageous for high-dimensional and non-Gaussian datasets.
SODA offers a significant advancement over existing variable selection techniques.