Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Mechanistic Models: Compartment Models in Individual and Population Analysis

Mechanistic Models: Compartment Models in Individual and Population Analysis

Mechanistic models are utilized in individual analysis using single-source data, but imperfections arise due to data collection errors, preventing perfect prediction of observed data. The mathematical equation involves known values (Xi), observed concentrations (Ci), measurement errors (εi), model parameters (ϕj), and the related function (ƒi) for i number of values. Different least-squares metrics quantify differences between predicted and observed values. The ordinary least...

Multiple Regression

Multiple Regression

Multiple regression assesses a linear relationship between one response or dependent variable and two or more independent variables. It has many practical applications.
Farmers can use multiple regression to determine the crop yield based on more than one factor, such as water availability, fertilizer, soil properties, etc. Here, the crop yield is the response or dependent variable as it depends on the other independent variables. The analysis requires the construction of a scatter plot...

Residuals and Least-Squares Property

Residuals and Least-Squares Property

The vertical distance between the actual value of y and the estimated value of y. In other words, it measures the vertical distance between the actual data point and the predicted point on the line
If the observed data point lies above the line, the residual is positive, and the line underestimates the actual data value for y. If the observed data point lies below the line, the residual is negative, and the line overestimates the actual data value for y.
The process of fitting the best-fit...

Regression Toward the Mean

Regression Toward the Mean

Regression toward the mean (“RTM”) is a phenomenon in which extremely high or low values—for example, and individual’s blood pressure at a particular moment—appear closer to a group’s average upon remeasuring. Although this statistical peculiarity is the result of random error and chance, it has been problematic across various medical, scientific, financial and psychological applications. In particular, RTM, if not taken into account, can interfere when...

Truncation in Survival Analysis

Truncation in Survival Analysis

Truncation in survival analysis refers to the exclusion of individuals or events from the dataset based on specific criteria related to the time of the event. This exclusion can happen in two primary forms: left truncation and right truncation.
Left truncation occurs when individuals who experienced the event of interest before a certain time are not included in the study. This is often due to a "delayed entry" into the study where only those who survive until a certain entry point are...

Comparing the Survival Analysis of Two or More Groups

Comparing the Survival Analysis of Two or More Groups

Survival analysis is a cornerstone of medical research, used to evaluate the time until an event of interest occurs, such as death, disease recurrence, or recovery. Unlike standard statistical methods, survival analysis is particularly adept at handling censored data—instances where the event has not occurred for some participants by the end of the study or remains unobserved. To address these unique challenges, specialized techniques like the Kaplan-Meier estimator, log-rank test, and...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Extending the Median Odds Ratio (MOR), the Interval Odds Ratio (IOR), and the Proportion of Opposed Odds Ratios (POOR) for Use With 3-Level Multilevel Logistic Regression Models.

Statistics in medicine·2026

Same author

Using Propensity Score Weighting With Clustered Data When the Treatment Is Applied at the Level of the Cluster and Outcomes Are Assessed at the Level of the Individual: The Observational Analog of Cluster Randomization Trials.

Statistics in medicine·2026

Same author

The Impact of Two Data-Generating Processes for Competing Risk Data on the Discrimination and Calibration of Two Types of Competing Risk Regression Models.

Statistics in medicine·2026

Same author

Patterns and Outcomes of Completeness of Revascularization in Patients With Diabetes and Non-ST-Segment-Elevation Myocardial Infarction in Ontario, Canada.

Circulation. Population health and outcomes·2026

Same author

Positive Airway Pressure Therapy Initiation and Continued Benzodiazepine Use Among Chronic Drug Users.

Journal of sleep research·2025

Same author

The impact of the number and the size of clusters on prediction performance of the stratified and the conditional shared gamma frailty Cox proportional hazards models.

medRxiv : the preprint server for health sciences·2025

Same journal

A joint model for a longitudinal outcome and a progressive multistate model under a mixed observation scheme.

Statistical methods in medical research·2026

Same journal

Efficient semi-supervised estimation of optimal individualized treatment regimes with survival outcome.

Statistical methods in medical research·2026

Same journal

Asymptotic online FWER control for dependent test statistics.

Statistical methods in medical research·2026

Same journal

Regression analysis of misclassified current status data with potentially unknown test accuracy.

Statistical methods in medical research·2026

Same journal

Bayesian multivariate linear mixed-effects models with varied association structures.

Statistical methods in medical research·2026

Same journal

Inference about the ratio of age-standardized rates between two overlapping populations.

Statistical methods in medical research·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Jul 15, 2025

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index

Published on: January 8, 2020

Logistic regression vs. predictive mean matching for imputing binary covariates.

Peter C Austin^1,2,3, Stef van Buuren^4,5

¹ICES, Toronto, ON, Canada.

Statistical Methods in Medical Research

|September 26, 2023

Summary

This summary is machine-generated.

Predictive mean matching offers a faster alternative to logistic regression for imputing missing binary data within Multivariate Imputation using Chained Equations (MICE) simulations, with comparable statistical performance.

Keywords:

Missing data Monte Carlo simulations multiple imputation

More Related Videos

Establishing a Competing Risk Regression Nomogram Model for Survival Data

Establishing a Competing Risk Regression Nomogram Model for Survival Data

Published on: October 23, 2020

Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach

Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach

Published on: July 3, 2020

Related Experiment Videos

Last Updated: Jul 15, 2025

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index

Published on: January 8, 2020

Establishing a Competing Risk Regression Nomogram Model for Survival Data

Establishing a Competing Risk Regression Nomogram Model for Survival Data

Published on: October 23, 2020

Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach

Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach

Published on: July 3, 2020

Area of Science:

Statistics
Computational Statistics
Data Science

Background:

Multivariate Imputation using Chained Equations (MICE) is a common method for handling missing data.
Parametric imputation (logistic regression) is standard for binary variables, but predictive mean matching is faster in R.
Limited research exists on predictive mean matching's statistical performance for binary variable imputation.

Purpose of the Study:

To compare the statistical performance of predictive mean matching against logistic regression for imputing missing binary variables.
To evaluate these methods under varying sample sizes and missing data prevalences.

Main Methods:

Monte Carlo simulations were employed to assess performance.
The analysis model of interest was a multivariable logistic regression.
Simulations varied sample sizes (N=250 to 10,000) and missing data prevalence (5% to 50%).

Main Results:

Predictive mean matching demonstrated statistical performance virtually identical to logistic regression for binary variable imputation.
This equivalence held across diverse sample sizes and missing data percentages.
Predictive mean matching significantly reduced computational time in simulations.

Conclusions:

Predictive mean matching is a statistically sound method for imputing missing binary variables.
It offers substantial computational efficiency gains for multiple imputation simulations.
Researchers can confidently use predictive mean matching for binary data imputation, especially when speed is a concern.