Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Statistical Inference Techniques in Hypothesis Testing: Parametric Versus Nonparametric Data

Statistical Inference Techniques in Hypothesis Testing: Parametric Versus Nonparametric Data

Statistical inference techniques, paramount in hypothesis testing, differentiate into two broad categories: parametric and nonparametric statistics.
Parametric statistics, as the name suggests, assumes that data follow a specific distribution, often a normal distribution. This assumption enables robust hypothesis testing and estimation. Parametric methods, like the Student's t-test or Goodness-of-fit test, are frequently employed in biostatistics due to their robustness. For instance,...

Introduction to Nonparametric Statistics

Introduction to Nonparametric Statistics

Nonparametric statistics offer a powerful alternative to traditional parametric methods, useful when assumptions about the population distribution cannot be made. Unlike parametric tests, which require data to follow a specific distribution with well-defined parameters (such as the mean and standard deviation), nonparametric tests do not require such constraints. This makes them particularly valuable when dealing with small sample sizes, skewed data, or ordinal and categorical variables.
One of...

Censoring Survival Data

Censoring Survival Data

Survival analysis is a statistical method used to analyze time-to-event data, often employed in fields such as medicine, engineering, and social sciences. One of the key challenges in survival analysis is dealing with incomplete data, a phenomenon known as "censoring." Censoring occurs when the event of interest (such as death, relapse, or system failure) has not occurred for some individuals by the end of the study period or is otherwise unobservable, and it might have many different...

How Data are Classified: Categorical Data

How Data are Classified: Categorical Data

A variable, usually notated by capital letters such as X and Y, is a characteristic or measurement that can be determined for each member of a population. Data are the actual values of variables. They may be numbers, or they may be words. Datum is a single value.
Data are classified based on whether they are measurable or not. Categorical data cannot be measured; instead, it can be divided into categories. For example, if Y denotes a person's party affiliation, some examples of Y include...

Ranks

Ranks

Unlike parametric methods, nonparametric statistics are ideal for nominal and ordinal data, requiring fewer assumptions about the population's nature or distribution. This makes nonparametric methods easier to apply and interpret, as they do not depend on parameters like mean or standard deviation. One common approach in nonparametric analysis is to sort data according to a specific criterion. For instance, we might arrange weather data from hottest to coldest days in a month or rank cities...

Kaplan-Meier Approach

Kaplan-Meier Approach

The Kaplan-Meier estimator is a non-parametric method used to estimate the survival function from time-to-event data. In medical research, it is frequently employed to measure the proportion of patients surviving for a certain period after treatment. This estimator is fundamental in analyzing time-to-event data, making it indispensable in clinical trials, epidemiological studies, and reliability engineering. By estimating survival probabilities, researchers can evaluate treatment effectiveness,...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Inference about the ratio of age-standardized rates between two overlapping populations.

Statistical methods in medical research·2026

Same author

Breast Cancer Incidence in Asian American, Native Hawaiian, and Pacific Islander Populations, 2000-2022.

JAMA network open·2026

Same author

Adjusting survival estimates for differential loss to follow-up by race-ethnicity: a SEER analysis.

Journal of the National Cancer Institute·2026

Same author

Repeated switching between biosimilar ABP 654 and reference ustekinumab in patients with moderate-to-severe plaque psoriasis: a randomized, double-blinded clinical trial to support interchangeability.

The British journal of dermatology·2026

Same author

Danshen decoction stabilizes vulnerable atherosclerotic plaques by maturating intraplaque neovessels via the Piezo1/yes-associated protein/angiopoietin-1 pathway.

Journal of ethnopharmacology·2026

Same author

A Bayesian method for analyzing combinations of continuous, ordinal, and nominal categorical data with missing values.

Journal of multivariate analysis·2026

Same journal

Sparse multi-way DMDC for longitudinal classification in high dimension low sample size data.

BMC medical research methodology·2026

Same journal

Tree-based exploratory identification of predictive biomarkers in non-randomized data.

BMC medical research methodology·2026

Same journal

Comparative evaluation of interrupted time series analytical methods for healthcare quality improvement research: a Monte Carlo simulation study.

BMC medical research methodology·2026

Same journal

Methodological advances in claims-based dementia algorithms: integrating medication and clinical data for medicare populations.

BMC medical research methodology·2026

Same journal

An interpretable XGboost algorithm for predicting 30-day mortality in acute pancreatitis using routine biomarkers.

BMC medical research methodology·2026

Same journal

Increasing power and robustness in screening trials by testing stored specimens in the control arm.

BMC medical research methodology·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Mar 1, 2026

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index

Published on: January 8, 2020

A nonparametric multiple imputation approach for missing categorical data.

Muhan Zhou¹, Yulei He², Mandi Yu³

¹Department of Epidemiology and Biostatistics, Mel and Enid Zuckerman College of Public Health, University of Arizona, 1295 N. Martin Ave., Tucson, 85724, USA.

BMC Medical Research Methodology

|June 8, 2017

Summary

This summary is machine-generated.

This study introduces a new imputation method for public health data with missing categorical variables. The approach effectively handles missing data by considering nonresponse probabilities, offering stable estimates even with high missingness.

Keywords:

Categorical data Double robustness Missing at Random Multiple imputation Nearest neighbour

More Related Videos

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

Establishing a Competing Risk Regression Nomogram Model for Survival Data

Establishing a Competing Risk Regression Nomogram Model for Survival Data

Published on: October 23, 2020

Related Experiment Videos

Last Updated: Mar 1, 2026

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index

Published on: January 8, 2020

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

Establishing a Competing Risk Regression Nomogram Model for Survival Data

Establishing a Competing Risk Regression Nomogram Model for Survival Data

Published on: October 23, 2020

Area of Science:

Public Health
Biostatistics
Data Science

Background:

Incomplete categorical variables with multiple categories are prevalent in public health datasets.
Existing missing-data imputation methods often fail to leverage nonresponse probabilities.
Accurate handling of missing categorical data is crucial for reliable public health research.

Purpose of the Study:

To propose a novel nearest-neighbor multiple imputation method for categorical outcomes.
To estimate category proportions using information from missingness probabilities.
To address limitations of existing methods in handling complex missing data patterns.

Main Methods:

Developed a nearest-neighbor multiple imputation technique for missing at random categorical outcomes.
Constructed a donor set using distances based on a predictive score from outcome and missingness models.
Employed multinomial logistic regression for the outcome model and logistic regression for the missingness model, combined via a weighting scheme.

Main Results:

The proposed method demonstrates good performance and stability, especially when missingness probabilities are not extreme.
Outperforms calibration estimators that can be unstable with very high missingness probabilities.
Highlights the importance of selecting appropriate weights to balance model contributions for optimal imputation.

Conclusions:

The proposed multiple imputation method is a viable strategy for addressing missing multi-level categorical outcome data.
Recommends using multinomial logistic regression for outcome prediction and binary logistic regression for missingness probability prediction.
Offers a robust approach for assessing outcome distributions in the presence of missing categorical data.