Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Statistical Inference Techniques in Hypothesis Testing: Parametric Versus Nonparametric Data01:16

Statistical Inference Techniques in Hypothesis Testing: Parametric Versus Nonparametric Data

544
Statistical inference techniques, paramount in hypothesis testing, differentiate into two broad categories: parametric and nonparametric statistics.
Parametric statistics, as the name suggests, assumes that data follow a specific distribution, often a normal distribution. This assumption enables robust hypothesis testing and estimation. Parametric methods, like the Student's t-test or Goodness-of-fit test, are frequently employed in biostatistics due to their robustness. For instance,...
544
Introduction to Nonparametric Statistics01:28

Introduction to Nonparametric Statistics

1.5K
Nonparametric statistics offer a powerful alternative to traditional parametric methods, useful when assumptions about the population distribution cannot be made. Unlike parametric tests, which require data to follow a specific distribution with well-defined parameters (such as the mean and standard deviation), nonparametric tests do not require such constraints. This makes them particularly valuable when dealing with small sample sizes, skewed data, or ordinal and categorical variables.
One of...
1.5K
Censoring Survival Data01:09

Censoring Survival Data

610
Survival analysis is a statistical method used to analyze time-to-event data, often employed in fields such as medicine, engineering, and social sciences. One of the key challenges in survival analysis is dealing with incomplete data, a phenomenon known as "censoring." Censoring occurs when the event of interest (such as death, relapse, or system failure) has not occurred for some individuals by the end of the study period or is otherwise unobservable, and it might have many different...
610
How Data are Classified: Categorical Data01:11

How Data are Classified: Categorical Data

46.6K
A variable, usually notated by capital letters such as X and Y, is a characteristic or measurement that can be determined for each member of a population. Data are the actual values of variables. They may be numbers, or they may be words. Datum is a single value.
Data are classified based on whether they are measurable or not. Categorical data cannot be measured; instead, it can be divided into categories. For example, if Y denotes a person's party affiliation, some examples of Y include...
46.6K
Ranks01:02

Ranks

555
Unlike parametric methods, nonparametric statistics are ideal for nominal and ordinal data, requiring fewer assumptions about the population's nature or distribution. This makes nonparametric methods easier to apply and interpret, as they do not depend on parameters like mean or standard deviation. One common approach in nonparametric analysis is to sort data according to a specific criterion. For instance, we might arrange weather data from hottest to coldest days in a month or rank cities...
555
Kaplan-Meier Approach01:24

Kaplan-Meier Approach

663
The Kaplan-Meier estimator is a non-parametric method used to estimate the survival function from time-to-event data. In medical research, it is frequently employed to measure the proportion of patients surviving for a certain period after treatment. This estimator is fundamental in analyzing time-to-event data, making it indispensable in clinical trials, epidemiological studies, and reliability engineering. By estimating survival probabilities, researchers can evaluate treatment effectiveness,...
663

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Inference about the ratio of age-standardized rates between two overlapping populations.

Statistical methods in medical research·2026
Same author

Breast Cancer Incidence in Asian American, Native Hawaiian, and Pacific Islander Populations, 2000-2022.

JAMA network open·2026
Same author

Adjusting survival estimates for differential loss to follow-up by race-ethnicity: a SEER analysis.

Journal of the National Cancer Institute·2026
Same author

Repeated switching between biosimilar ABP 654 and reference ustekinumab in patients with moderate-to-severe plaque psoriasis: a randomized, double-blinded clinical trial to support interchangeability.

The British journal of dermatology·2026
Same author

Danshen decoction stabilizes vulnerable atherosclerotic plaques by maturating intraplaque neovessels via the Piezo1/yes-associated protein/angiopoietin-1 pathway.

Journal of ethnopharmacology·2026
Same author

A Bayesian method for analyzing combinations of continuous, ordinal, and nominal categorical data with missing values.

Journal of multivariate analysis·2026
Same journal

Sparse multi-way DMDC for longitudinal classification in high dimension low sample size data.

BMC medical research methodology·2026
Same journal

Tree-based exploratory identification of predictive biomarkers in non-randomized data.

BMC medical research methodology·2026
Same journal

Comparative evaluation of interrupted time series analytical methods for healthcare quality improvement research: a Monte Carlo simulation study.

BMC medical research methodology·2026
Same journal

Methodological advances in claims-based dementia algorithms: integrating medication and clinical data for medicare populations.

BMC medical research methodology·2026
Same journal

An interpretable XGboost algorithm for predicting 30-day mortality in acute pancreatitis using routine biomarkers.

BMC medical research methodology·2026
Same journal

Increasing power and robustness in screening trials by testing stored specimens in the control arm.

BMC medical research methodology·2026
See all related articles

Related Experiment Video

Updated: Mar 1, 2026

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index
06:55

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index

Published on: January 8, 2020

15.4K

A nonparametric multiple imputation approach for missing categorical data.

Muhan Zhou1, Yulei He2, Mandi Yu3

  • 1Department of Epidemiology and Biostatistics, Mel and Enid Zuckerman College of Public Health, University of Arizona, 1295 N. Martin Ave., Tucson, 85724, USA.

BMC Medical Research Methodology
|June 8, 2017
PubMed
Summary
This summary is machine-generated.

This study introduces a new imputation method for public health data with missing categorical variables. The approach effectively handles missing data by considering nonresponse probabilities, offering stable estimates even with high missingness.

Keywords:
Categorical dataDouble robustnessMissing at RandomMultiple imputationNearest neighbour

More Related Videos

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment
12:18

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

8.2K
Establishing a Competing Risk Regression Nomogram Model for Survival Data
04:57

Establishing a Competing Risk Regression Nomogram Model for Survival Data

Published on: October 23, 2020

10.9K

Related Experiment Videos

Last Updated: Mar 1, 2026

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index
06:55

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index

Published on: January 8, 2020

15.4K
A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment
12:18

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

8.2K
Establishing a Competing Risk Regression Nomogram Model for Survival Data
04:57

Establishing a Competing Risk Regression Nomogram Model for Survival Data

Published on: October 23, 2020

10.9K

Area of Science:

  • Public Health
  • Biostatistics
  • Data Science

Background:

  • Incomplete categorical variables with multiple categories are prevalent in public health datasets.
  • Existing missing-data imputation methods often fail to leverage nonresponse probabilities.
  • Accurate handling of missing categorical data is crucial for reliable public health research.

Purpose of the Study:

  • To propose a novel nearest-neighbor multiple imputation method for categorical outcomes.
  • To estimate category proportions using information from missingness probabilities.
  • To address limitations of existing methods in handling complex missing data patterns.

Main Methods:

  • Developed a nearest-neighbor multiple imputation technique for missing at random categorical outcomes.
  • Constructed a donor set using distances based on a predictive score from outcome and missingness models.
  • Employed multinomial logistic regression for the outcome model and logistic regression for the missingness model, combined via a weighting scheme.

Main Results:

  • The proposed method demonstrates good performance and stability, especially when missingness probabilities are not extreme.
  • Outperforms calibration estimators that can be unstable with very high missingness probabilities.
  • Highlights the importance of selecting appropriate weights to balance model contributions for optimal imputation.

Conclusions:

  • The proposed multiple imputation method is a viable strategy for addressing missing multi-level categorical outcome data.
  • Recommends using multinomial logistic regression for outcome prediction and binary logistic regression for missingness probability prediction.
  • Offers a robust approach for assessing outcome distributions in the presence of missing categorical data.