Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Mechanistic Models: Compartment Models in Individual and Population Analysis01:23

Mechanistic Models: Compartment Models in Individual and Population Analysis

64
Mechanistic models are utilized in individual analysis using single-source data, but imperfections arise due to data collection errors, preventing perfect prediction of observed data. The mathematical equation involves known values (Xi), observed concentrations (Ci), measurement errors (εi), model parameters (ϕj), and the related function (ƒi) for i number of values. Different least-squares metrics quantify differences between predicted and observed values. The ordinary least...
64
Multiple Regression01:25

Multiple Regression

3.0K
Multiple regression assesses a linear relationship between one response or dependent variable and two or more independent variables. It has many practical applications.
Farmers can use multiple regression to determine the crop yield based on more than one factor, such as water availability, fertilizer, soil properties, etc. Here, the crop yield is the response or dependent variable as it depends on the other independent variables. The analysis requires the construction of a scatter plot...
3.0K
Residuals and Least-Squares Property01:11

Residuals and Least-Squares Property

7.4K
The vertical distance between the actual value of y and the estimated value of y. In other words, it measures the vertical distance between the actual data point and the predicted point on the line
If the observed data point lies above the line, the residual is positive, and the line underestimates the actual data value for y. If the observed data point lies below the line, the residual is negative, and the line overestimates the actual data value for y.
The process of fitting the best-fit...
7.4K
Regression Toward the Mean01:52

Regression Toward the Mean

6.3K
Regression toward the mean (“RTM”) is a phenomenon in which extremely high or low values—for example, and individual’s blood pressure at a particular moment—appear closer to a group’s average upon remeasuring. Although this statistical peculiarity is the result of random error and chance, it has been problematic across various medical, scientific, financial and psychological applications. In particular, RTM, if not taken into account, can interfere when...
6.3K
Truncation in Survival Analysis01:09

Truncation in Survival Analysis

233
Truncation in survival analysis refers to the exclusion of individuals or events from the dataset based on specific criteria related to the time of the event. This exclusion can happen in two primary forms: left truncation and right truncation.
Left truncation occurs when individuals who experienced the event of interest before a certain time are not included in the study. This is often due to a "delayed entry" into the study where only those who survive until a certain entry point are...
233
Comparing the Survival Analysis of Two or More Groups01:20

Comparing the Survival Analysis of Two or More Groups

218
Survival analysis is a cornerstone of medical research, used to evaluate the time until an event of interest occurs, such as death, disease recurrence, or recovery. Unlike standard statistical methods, survival analysis is particularly adept at handling censored data—instances where the event has not occurred for some participants by the end of the study or remains unobserved. To address these unique challenges, specialized techniques like the Kaplan-Meier estimator, log-rank test, and...
218

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Extending the Median Odds Ratio (MOR), the Interval Odds Ratio (IOR), and the Proportion of Opposed Odds Ratios (POOR) for Use With 3-Level Multilevel Logistic Regression Models.

Statistics in medicineĀ·2026
Same author

Using Propensity Score Weighting With Clustered Data When the Treatment Is Applied at the Level of the Cluster and Outcomes Are Assessed at the Level of the Individual: The Observational Analog of Cluster Randomization Trials.

Statistics in medicineĀ·2026
Same author

The Impact of Two Data-Generating Processes for Competing Risk Data on the Discrimination and Calibration of Two Types of Competing Risk Regression Models.

Statistics in medicineĀ·2026
Same author

Patterns and Outcomes of Completeness of Revascularization in Patients With Diabetes and Non-ST-Segment-Elevation Myocardial Infarction in Ontario, Canada.

Circulation. Population health and outcomesĀ·2026
Same author

Positive Airway Pressure Therapy Initiation and Continued Benzodiazepine Use Among Chronic Drug Users.

Journal of sleep researchĀ·2025
Same author

The impact of the number and the size of clusters on prediction performance of the stratified and the conditional shared gamma frailty Cox proportional hazards models.

medRxiv : the preprint server for health sciencesĀ·2025
Same journal

A joint model for a longitudinal outcome and a progressive multistate model under a mixed observation scheme.

Statistical methods in medical researchĀ·2026
Same journal

Efficient semi-supervised estimation of optimal individualized treatment regimes with survival outcome.

Statistical methods in medical researchĀ·2026
Same journal

Asymptotic online FWER control for dependent test statistics.

Statistical methods in medical researchĀ·2026
Same journal

Regression analysis of misclassified current status data with potentially unknown test accuracy.

Statistical methods in medical researchĀ·2026
Same journal

Bayesian multivariate linear mixed-effects models with varied association structures.

Statistical methods in medical researchĀ·2026
Same journal

Inference about the ratio of age-standardized rates between two overlapping populations.

Statistical methods in medical researchĀ·2026
See all related articles

Related Experiment Video

Updated: Jul 15, 2025

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index
06:55

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index

Published on: January 8, 2020

14.5K

Logistic regression vs. predictive mean matching for imputing binary covariates.

Peter C Austin1,2,3, Stef van Buuren4,5

  • 1ICES, Toronto, ON, Canada.

Statistical Methods in Medical Research
|September 26, 2023
PubMed
Summary
This summary is machine-generated.

Predictive mean matching offers a faster alternative to logistic regression for imputing missing binary data within Multivariate Imputation using Chained Equations (MICE) simulations, with comparable statistical performance.

Keywords:
Missing dataMonte Carlo simulationsmultiple imputation

More Related Videos

Establishing a Competing Risk Regression Nomogram Model for Survival Data
04:57

Establishing a Competing Risk Regression Nomogram Model for Survival Data

Published on: October 23, 2020

10.2K
Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach
04:35

Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach

Published on: July 3, 2020

3.4K

Related Experiment Videos

Last Updated: Jul 15, 2025

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index
06:55

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index

Published on: January 8, 2020

14.5K
Establishing a Competing Risk Regression Nomogram Model for Survival Data
04:57

Establishing a Competing Risk Regression Nomogram Model for Survival Data

Published on: October 23, 2020

10.2K
Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach
04:35

Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach

Published on: July 3, 2020

3.4K

Area of Science:

  • Statistics
  • Computational Statistics
  • Data Science

Background:

  • Multivariate Imputation using Chained Equations (MICE) is a common method for handling missing data.
  • Parametric imputation (logistic regression) is standard for binary variables, but predictive mean matching is faster in R.
  • Limited research exists on predictive mean matching's statistical performance for binary variable imputation.

Purpose of the Study:

  • To compare the statistical performance of predictive mean matching against logistic regression for imputing missing binary variables.
  • To evaluate these methods under varying sample sizes and missing data prevalences.

Main Methods:

  • Monte Carlo simulations were employed to assess performance.
  • The analysis model of interest was a multivariable logistic regression.
  • Simulations varied sample sizes (N=250 to 10,000) and missing data prevalence (5% to 50%).

Main Results:

  • Predictive mean matching demonstrated statistical performance virtually identical to logistic regression for binary variable imputation.
  • This equivalence held across diverse sample sizes and missing data percentages.
  • Predictive mean matching significantly reduced computational time in simulations.

Conclusions:

  • Predictive mean matching is a statistically sound method for imputing missing binary variables.
  • It offers substantial computational efficiency gains for multiple imputation simulations.
  • Researchers can confidently use predictive mean matching for binary data imputation, especially when speed is a concern.