Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Experiment Videos

How should variable selection be performed with multiply imputed data?

Angela M Wood1, Ian R White, Patrick Royston

  • 1Department of Public Health and Primary Care, University of Cambridge, Strangeways Research Laboratory, Worts Causeway, Cambridge CB2 8RN, UK. amw79@medschl.cam.ac.uk

Statistics in Medicine
|January 19, 2008
PubMed
Summary
This summary is machine-generated.

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Whole-population trends in obesity across dimensions of inequality in England, 2019-25: a retrospective, longitudinal cohort study of 54 million adults.

The lancet. Diabetes & endocrinology·2026
Same author

Measurement of quality of stroke care with national electronic health records: a prospective cohort study during and after the COVID-19 pandemic.

BMJ open·2026
Same author

A Bayesian Location-Scale Joint Model for Time-To-Event and Multivariate Longitudinal Data With Association Based on Within-Individual Variability.

Statistics in medicine·2026
Same author

Adolescent Blood Pressure and Cardiovascular Disease Before Age 50 Years.

Hypertension (Dallas, Tex. : 1979)·2026
Same author

Cardiac rehabilitation after transcatheter aortic valve implantation before, during and after the COVID-19 pandemic: a whole-population study.

Heart (British Cardiac Society)·2026
Same author

Risk prediction in patients with heart failure with preserved ejection fraction: the LIFE-Preserved model.

European heart journal·2026
Same journal

Optimal Weighted Tests for Replication Studies and the 'Two-Trials Rule' With Multiple Hypotheses.

Statistics in medicine·2026
Same journal

Identifiable Copula-Double-Cox Models: A Fully Parametric Framework for Dependent Right-Censored Survival Data.

Statistics in medicine·2026
Same journal

Moving From Individualized Risk-Based Prevention to Benefit-Based Prevention: Estimating Individualized Life-Years Gained From Prevention Services as a Basis for Eligibility.

Statistics in medicine·2026
Same journal

A Mixture of Distributed Lag Non-Linear Models to Account for Spatially Heterogeneous Exposure-Lag-Response Associations.

Statistics in medicine·2026
Same journal

Practical Considerations for Gaussian Process Modeling for Causal Inference in Quasi-Experimental Studies With Panel Data.

Statistics in medicine·2026
Same journal

Covariate Adjustment for Wilcoxon Two Sample Statistic and Test.

Statistics in medicine·2026
See all related articles

Variable selection with incomplete data is challenging. Using Rubin's rules (RR) for variable selection in multiply imputed datasets preserves Type 1 error, making it the recommended approach for accurate statistical analysis.

Area of Science:

  • Statistics
  • Biostatistics
  • Data Science

Background:

  • Multiple imputation is widely used for handling incomplete data.
  • Established methods exist for parameter estimation (Rubin's rules) but lack guidance for variable selection.
  • Current variable selection practices for incomplete data can be inefficient and biased.

Purpose of the Study:

  • To evaluate different methods for variable selection in multiply imputed datasets.
  • To compare the performance of complete-case analysis, repeated Rubin's rules, and a stacked method.
  • To identify the most reliable approach for maintaining statistical integrity.

Main Methods:

  • Simulations based on a community psychiatry trial.
  • Comparison of variable selection strategies including complete-case analysis, repeated Rubin's rules, and a weighted stacked dataset approach.

Related Experiment Videos

  • Evaluation of parameter estimation and Type 1 error rates.
  • Main Results:

    • Most methods outperformed the naive complete-case analysis.
    • Type 1 error rates were only preserved when variable selection was based on Rubin's rules.
    • The stacked method offered an approximation but did not fully preserve Type 1 error.

    Conclusions:

    • Rubin's rules provide a statistically sound method for variable selection with multiply imputed data.
    • Complete-case analysis is inefficient and potentially biased.
    • The recommended approach for variable selection in multiply imputed datasets is to use Rubin's rules to maintain Type 1 error control.