Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Multiple Regression01:25

Multiple Regression

3.7K
Multiple regression assesses a linear relationship between one response or dependent variable and two or more independent variables. It has many practical applications.
Farmers can use multiple regression to determine the crop yield based on more than one factor, such as water availability, fertilizer, soil properties, etc. Here, the crop yield is the response or dependent variable as it depends on the other independent variables. The analysis requires the construction of a scatter plot...
3.7K
Randomized Experiments01:13

Randomized Experiments

8.7K
The randomization process involves assigning study participants randomly to experimental or control groups based on their probability of being equally assigned. Randomization is meant to eliminate selection bias and balance known and unknown confounding factors so that the control group is similar to the treatment group as much as possible. A computer program and a random number generator can be used to assign participants to groups in a way that minimizes bias.
Simple randomization
Simple...
8.7K
Regression Analysis01:11

Regression Analysis

7.5K
Regression analysis is a statistical tool that describes a mathematical relationship between a dependent variable and one or more independent variables.
In regression analysis, a regression equation is determined based on the line of best fit– a line that best fits the data points plotted in a graph. This line is also called the regression line. The algebraic equation for the regression line is called the regression equation. It is represented as:
7.5K
Introduction To Survival Analysis01:18

Introduction To Survival Analysis

573
Survival analysis is a statistical method used to study time-to-event data, where the "event" might represent outcomes like death, disease relapse, system failure, or recovery. A unique feature of survival data is censoring, which occurs when the event of interest has not been observed for some individuals during the study period. This requires specialized techniques to handle incomplete data effectively.
The primary goal of survival analysis is to estimate survival time—the time...
573
Goodness-of-Fit Test01:16

Goodness-of-Fit Test

7.3K
The goodness-of-fit test is a type of hypothesis test which determines whether the data "fits" a particular distribution. For example, one may suspect that some anonymous data may fit a binomial distribution. A chi-square test (meaning the distribution for the hypothesis test is chi-square) can be used to determine if there is a fit. The null and alternative hypotheses may be written in sentences or stated as equations or inequalities. The test statistic for a goodness-of-fit test is given as...
7.3K
Parametric Survival Analysis: Weibull and Exponential Methods01:14

Parametric Survival Analysis: Weibull and Exponential Methods

881
Parametric survival analysis models survival data by assuming a specific probability distribution for the time until an event occurs. The Weibull and exponential distributions are two of the most commonly used methods in this context, due to their versatility and relatively straightforward application.
Weibull Distribution
The Weibull distribution is a flexible model used in parametric survival analysis. It can handle both increasing and decreasing hazard rates, depending on its shape parameter...
881

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

JADE: Joint Alignment and Deep Embedding for Multi-Slice Spatial Transcriptomics.

Advances in neural information processing systems·2026
Same author

Phenotypic prediction of missense variants via deep contrastive learning.

Nature biomedical engineering·2026
Same author

DEDUCE: statistical inference on disease-associated genes uncovers tissue-disease associations.

NAR genomics and bioinformatics·2026
Same author

Designing strongly coupled polaritonic structures via statistical machine learning.

Proceedings of the National Academy of Sciences of the United States of America·2025
Same author

JADE: Joint Alignment and Deep Embedding for Multi-Slice Spatial Transcriptomics.

bioRxiv : the preprint server for biology·2025
Same author

Participation bias in the estimation of heritability and genetic correlation.

Proceedings of the National Academy of Sciences of the United States of America·2025
Same journal

Instrumental Variable Estimation of Marginal Structural Mean Models for Time-Varying Treatment.

Journal of the American Statistical Association·2026
Same journal

Semiparametric Joint Modeling for Survival Analysis with Longitudinal Covariates.

Journal of the American Statistical Association·2026
Same journal

Dimension Reduction for Large-Scale Federated Data: Statistical Rate and Asymptotic Inference.

Journal of the American Statistical Association·2026
Same journal

Facilitating Heterogeneous Effect Estimation via Statistically Efficient Categorical Modifiers.

Journal of the American Statistical Association·2026
Same journal

Nonparametric Density Estimation of a Long-Term Trend from Repeated Semicontinuous Data.

Journal of the American Statistical Association·2026
Same journal

Functional Integrative Bayesian Analysis of High-dimensional Multiplatform Clinicogenomic Data.

Journal of the American Statistical Association·2026
See all related articles

Related Experiment Video

Updated: Dec 10, 2025

Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach
04:35

Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach

Published on: July 3, 2020

3.6K

Robust Variable and Interaction Selection for Logistic Regression and General Index Models.

Yang Li1, Jun S Liu1

  • 1Yang Li is Sr. Market Scientist, Vatic Labs LLC, New York, NY 10036. Jun S Liu is Professor, Department of Statistics, Harvard University, Cambridge, MA 02138; and is also co- Director for the Center for Statistical Science, Department of Industrial Engineering, Tsinghua University, Beijing, China.

Journal of the American Statistical Association
|September 1, 2020
PubMed
Summary
This summary is machine-generated.

SODA, a new variable selection method, efficiently handles high-dimensional data in logistic and general index models. It offers enhanced robustness and superior performance, especially with non-Gaussian data.

Keywords:
ClassificationForward screeningHigh-dimensionalQuadriatic discriminant analysisSemi-parametricStepwise selection

More Related Videos

The Innovation Arena: A Method for Comparing Innovative Problem-Solving Across Groups
14:14

The Innovation Arena: A Method for Comparing Innovative Problem-Solving Across Groups

Published on: May 13, 2022

6.2K
Lexical Decision Task for Studying Written Word Recognition in Adults with and without Dementia or Mild Cognitive Impairment
06:48

Lexical Decision Task for Studying Written Word Recognition in Adults with and without Dementia or Mild Cognitive Impairment

Published on: June 25, 2019

9.6K

Related Experiment Videos

Last Updated: Dec 10, 2025

Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach
04:35

Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach

Published on: July 3, 2020

3.6K
The Innovation Arena: A Method for Comparing Innovative Problem-Solving Across Groups
14:14

The Innovation Arena: A Method for Comparing Innovative Problem-Solving Across Groups

Published on: May 13, 2022

6.2K
Lexical Decision Task for Studying Written Word Recognition in Adults with and without Dementia or Mild Cognitive Impairment
06:48

Lexical Decision Task for Studying Written Word Recognition in Adults with and without Dementia or Mild Cognitive Impairment

Published on: June 25, 2019

9.6K

Area of Science:

  • Statistics
  • Machine Learning
  • Data Science

Background:

  • Variable selection is crucial for building robust statistical models.
  • Existing methods often struggle with high-dimensional data and specific distributional assumptions.

Purpose of the Study:

  • To introduce SODA, a novel forward-backward variable selection method.
  • To extend SODA for application in general index models.
  • To enhance robustness and performance in high-dimensional settings.

Main Methods:

  • SODA employs a forward stage to add significant predictors and a backward stage to remove unimportant terms, optimizing the extended Bayesian Information Criterion (EBIC).
  • The method is extended for variable selection and model fitting in general index models.
  • Theoretical analysis confirms variable-selection consistency under high-dimensional conditions.

Main Results:

  • SODA effectively handles high-dimensional data where predictors exceed sample size.
  • It does not require joint normality assumptions, offering greater robustness than existing methods.
  • SODA demonstrates superior performance with non-Gaussian design matrices in simulations and real-data applications.

Conclusions:

  • SODA provides a robust and consistent variable selection approach for logistic and general index models.
  • The method is particularly advantageous for high-dimensional and non-Gaussian datasets.
  • SODA offers a significant advancement over existing variable selection techniques.