Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Truncation in Survival Analysis

Truncation in Survival Analysis

Truncation in survival analysis refers to the exclusion of individuals or events from the dataset based on specific criteria related to the time of the event. This exclusion can happen in two primary forms: left truncation and right truncation.
Left truncation occurs when individuals who experienced the event of interest before a certain time are not included in the study. This is often due to a "delayed entry" into the study where only those who survive until a certain entry point are...

Parametric Survival Analysis: Weibull and Exponential Methods

Parametric Survival Analysis: Weibull and Exponential Methods

Parametric survival analysis models survival data by assuming a specific probability distribution for the time until an event occurs. The Weibull and exponential distributions are two of the most commonly used methods in this context, due to their versatility and relatively straightforward application.
Weibull Distribution
The Weibull distribution is a flexible model used in parametric survival analysis. It can handle both increasing and decreasing hazard rates, depending on its shape parameter...

Censoring Survival Data

Censoring Survival Data

Survival analysis is a statistical method used to analyze time-to-event data, often employed in fields such as medicine, engineering, and social sciences. One of the key challenges in survival analysis is dealing with incomplete data, a phenomenon known as "censoring." Censoring occurs when the event of interest (such as death, relapse, or system failure) has not occurred for some individuals by the end of the study period or is otherwise unobservable, and it might have many different...

Choosing Between z and t Distribution

Choosing Between z and t Distribution

The z and the Student t distribution estimate the population mean using the sample mean and standard deviation. However, to decide which distribution to use for a calculation, one needs to determine the sample size, the nature of the distribution, and whether the population standard deviation is known. If the population standard deviation is known and the population is normally distributed, or if the sample size is greater than 30, the z distribution is preferred. The Student t distribution is...

Mechanistic Models: Compartment Models in Individual and Population Analysis

Mechanistic Models: Compartment Models in Individual and Population Analysis

Mechanistic models are utilized in individual analysis using single-source data, but imperfections arise due to data collection errors, preventing perfect prediction of observed data. The mathematical equation involves known values (Xi), observed concentrations (Ci), measurement errors (εi), model parameters (ϕj), and the related function (ƒi) for i number of values. Different least-squares metrics quantify differences between predicted and observed values. The ordinary least...

Friedman Two-way Analysis of Variance by Ranks

Friedman Two-way Analysis of Variance by Ranks

Friedman's Two-Way Analysis of Variance by Ranks is a nonparametric test designed to identify differences across multiple test attempts when traditional assumptions of normality and equal variances do not apply. Unlike conventional ANOVA, which requires normally distributed data with equal variances, Friedman's test is ideal for ordinal or non-normally distributed data, making it particularly useful for analyzing dependent samples, such as matched subjects over time or repeated measures...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

A model selection criterion for clustered survival analysis with informative cluster size.

Pharmaceutical statistics·2022

Same author

Model selection based on resampling approaches for cluster longitudinal data with missingness in outcomes.

Statistics in medicine·2018

Same author

Model selection for semiparametric marginal mean regression accounting for within-cluster subsampling variability and informative cluster size.

Biometrics·2018

Same author

Joint model selection of marginal mean regression and correlation structure for longitudinal data with missing outcome and covariates.

Biometrical journal. Biometrische Zeitschrift·2017

Same journal

Optimal Weighted Tests for Replication Studies and the 'Two-Trials Rule' With Multiple Hypotheses.

Statistics in medicine·2026

Same journal

Identifiable Copula-Double-Cox Models: A Fully Parametric Framework for Dependent Right-Censored Survival Data.

Statistics in medicine·2026

Same journal

Moving From Individualized Risk-Based Prevention to Benefit-Based Prevention: Estimating Individualized Life-Years Gained From Prevention Services as a Basis for Eligibility.

Statistics in medicine·2026

Same journal

A Mixture of Distributed Lag Non-Linear Models to Account for Spatially Heterogeneous Exposure-Lag-Response Associations.

Statistics in medicine·2026

Same journal

Practical Considerations for Gaussian Process Modeling for Causal Inference in Quasi-Experimental Studies With Panel Data.

Statistics in medicine·2026

Same journal

Covariate Adjustment for Wilcoxon Two Sample Statistic and Test.

Statistics in medicine·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Sep 26, 2025

Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach

Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach

Published on: July 3, 2020

Distribution-free model selection for longitudinal zero-inflated count data with missing responses and covariates.

Chun-Shu Chen¹, Chung-Wei Shen²

¹Graduate Institute of Statistics, National Central University, Taoyuan, Taiwan, Republic of China.

Statistics in Medicine

|April 16, 2022

Summary

This summary is machine-generated.

This study introduces a new method for analyzing complex count data with many zeros and missing values, common in medical and social sciences. The approach helps identify key factors influencing outcomes, even with incomplete data.

Keywords:

generalized estimating equations missing at random two-component mixture models variable selection zero-inflation

More Related Videos

Establishing a Competing Risk Regression Nomogram Model for Survival Data

Establishing a Competing Risk Regression Nomogram Model for Survival Data

Published on: October 23, 2020

A Method of Trigonometric Modelling of Seasonal Variation Demonstrated with Multiple Sclerosis Relapse Data

A Method of Trigonometric Modelling of Seasonal Variation Demonstrated with Multiple Sclerosis Relapse Data

Published on: December 9, 2015

Related Experiment Videos

Last Updated: Sep 26, 2025

Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach

Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach

Published on: July 3, 2020

Establishing a Competing Risk Regression Nomogram Model for Survival Data

Establishing a Competing Risk Regression Nomogram Model for Survival Data

Published on: October 23, 2020

A Method of Trigonometric Modelling of Seasonal Variation Demonstrated with Multiple Sclerosis Relapse Data

A Method of Trigonometric Modelling of Seasonal Variation Demonstrated with Multiple Sclerosis Relapse Data

Published on: December 9, 2015

Area of Science:

Biostatistics
Epidemiology
Longitudinal Data Analysis

Background:

Count data with excess zeros and clustered correlations are prevalent in medical and social science research.
Existing models like zero-inflated binomial (ZIB), negative binomial (ZINB), and Poisson (ZIP) have limitations with missing data and assumption deviations.
Semiparametric weighted generalized estimating equations offer a robust approach for handling missingness in longitudinal count data.

Purpose of the Study:

To propose a distribution-free model selection criterion for identifying important covariates in longitudinal count data with excess zeros and missingness.
To evaluate the performance of the proposed covariate selection method under various scenarios of excess zeros and missing data.
To illustrate the application of the method using a real-world cardiovascular disease dataset.

Main Methods:

Development of a model selection criterion based on expected weighted quadratic loss for covariate selection.
Application of semiparametric weighted generalized estimating equations to handle non-monotone missingness in responses and covariates.
Simulation studies to assess covariate selection effects under different percentages of excess zeros and missing data.

Main Results:

The proposed model selection criterion effectively identifies relevant covariates without assuming data distribution.
The method demonstrates robustness in scenarios with substantial excess zeros and non-monotone missingness.
The real data example on cardiovascular disease illustrates the practical utility of the approach.

Conclusions:

The developed distribution-free covariate selection method provides a valuable tool for analyzing complex longitudinal count data.
This approach enhances the reliability of statistical inference in the presence of excess zeros and missing data.
The findings have significant implications for studies in medical and social sciences where such data characteristics are common.