Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Detection of Gross Error: The Q Test

Detection of Gross Error: The Q Test

When one or more data points appear far from the rest of the data, there is a need to determine whether they are outliers and whether they should be eliminated from the data set to ensure an accurate representation of the measured value. In many cases, outliers arise from gross errors (or human errors) and do not accurately reflect the underlying phenomenon. In some cases, however, these apparent outliers reflect true phenomenological differences. In these cases, we can use statistical methods...

Censoring Survival Data

Censoring Survival Data

Survival analysis is a statistical method used to analyze time-to-event data, often employed in fields such as medicine, engineering, and social sciences. One of the key challenges in survival analysis is dealing with incomplete data, a phenomenon known as "censoring." Censoring occurs when the event of interest (such as death, relapse, or system failure) has not occurred for some individuals by the end of the study period or is otherwise unobservable, and it might have many different...

Survival Tree

Survival Tree

Survival trees are a non-parametric method used in survival analysis to model the relationship between a set of covariates and the time until an event of interest occurs, often referred to as the "time-to-event" or "survival time." This method is particularly useful when dealing with censored data, where the event has not occurred for some individuals by the end of the study period, or when the exact time of the event is unknown.
Building a Survival Tree
Constructing a...

Quantifying and Rejecting Outliers: The Grubbs Test

Quantifying and Rejecting Outliers: The Grubbs Test

Sometimes, a data set can have a recorded numerical observation that greatly deviates from the rest of the data. Assuming that the data is normally distributed, a statistical method called the Grubbs test can be used to determine whether the observation is truly an outlier. To perform a two-tailed Grubbs test, first, calculate the absolute difference between the outlier and the mean. Then, calculate the ratio between this difference and the standard deviation of the sample. This...

Statistical Inference Techniques in Hypothesis Testing: Parametric Versus Nonparametric Data

Statistical Inference Techniques in Hypothesis Testing: Parametric Versus Nonparametric Data

Statistical inference techniques, paramount in hypothesis testing, differentiate into two broad categories: parametric and nonparametric statistics.
Parametric statistics, as the name suggests, assumes that data follow a specific distribution, often a normal distribution. This assumption enables robust hypothesis testing and estimation. Parametric methods, like the Student's t-test or Goodness-of-fit test, are frequently employed in biostatistics due to their robustness. For instance,...

Prediction Intervals

Prediction Intervals

The interval estimate of any variable is known as the prediction interval. It helps decide if a point estimate is dependable.
However, the point estimate is most likely not the exact value of the population parameter, but close to it. After calculating point estimates, we construct interval estimates, called confidence intervals or prediction intervals. This prediction interval comprises a range of values unlike the point estimate and is a better predictor of the observed sample value, y.

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Causal information changes how we reason: a mixed-methods analysis of decision-making with causal information.

Frontiers in cognition·2026

Same author

Artificial Intelligence and Machine Learning Resource Guide: The Academy of Nutrition and Dietetics and the American Society for Nutrition Joint Taskforce for Artificial Intelligence.

The American journal of clinical nutrition·2026

Same author

Artificial Intelligence and Machine Learning Resource Guide: The Academy of Nutrition and Dietetics and the American Society for Nutrition Joint Taskforce for Artificial Intelligence.

Journal of the Academy of Nutrition and Dietetics·2026

Same author

Evaluating Causal and Noncausal Text Messages to Promote Physical Activity in Adults: Randomized Pilot Study.

JMIR formative research·2025

Same author

Estimating days needed for dietary assessment in pregnancy: a modeling study.

The American journal of clinical nutrition·2025

Same author

Integrative Genomic and Immune Profiling to Identify and Characterize High-Risk Subgroups in Acute Myeloid Leukemia: Development of a 20-Gene Predictive Signature and Its Clinical Implications.

Omics : a journal of integrative biology·2025

Same journal

Towards the Efficient Inference by Incorporating Automated Computational Phenotypes under Covariate Shift.

Proceedings of machine learning research·2026

Same journal

Endo-SemiS: Towards Robust Semi-Supervised Image Segmentation for Endoscopic Video.

Proceedings of machine learning research·2026

Same journal

Perspective: Machine Learning for Health Should Consider Social Drivers of Health.

Proceedings of machine learning research·2026

Same journal

Classifying Phonotrauma Severity from Vocal Fold Images with Soft Ordinal Regression.

Proceedings of machine learning research·2026

Same journal

Does Domain-Specific Retrieval Augmented Generation Help LLMs Answer Consumer Health Questions?

Proceedings of machine learning research·2026

Same journal

Quantitative Convergence Analysis of Projected Stochastic Gradient Descent for Non-Convex Losses via the Goldstein Subdifferential.

Proceedings of machine learning research·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Sep 9, 2025

Author Spotlight: Alignment of Synchronized Time-Series Data Using the Characterizing Loss of Cell Cycle Synchrony Model for Cross-Experiment Comparisons

Author Spotlight: Alignment of Synchronized Time-Series Data Using the Characterizing Loss of Cell Cycle Synchrony Model for Cross-Experiment Comparisons

Published on: June 9, 2023

Benchmarking Missing Data Imputation Methods for Time Series Using Real-World Test Cases.

Adedolapo Aishat Toye¹, Asuman Celik¹, Samantha Kleinberg¹

¹Department of Computer Science, Stevens Institute of Technology, USA.

Proceedings of Machine Learning Research

|September 2, 2025

Summary

This summary is machine-generated.

Healthcare imputation methods perform best on random missing data, not realistic patterns. Linear interpolation showed the lowest error across all missing data types, highlighting a need for better evaluation and imputation techniques for complex missingness.

More Related Videos

A Method of Trigonometric Modelling of Seasonal Variation Demonstrated with Multiple Sclerosis Relapse Data

A Method of Trigonometric Modelling of Seasonal Variation Demonstrated with Multiple Sclerosis Relapse Data

Published on: December 9, 2015

Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach

Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach

Published on: July 3, 2020

Related Experiment Videos

Last Updated: Sep 9, 2025

Author Spotlight: Alignment of Synchronized Time-Series Data Using the Characterizing Loss of Cell Cycle Synchrony Model for Cross-Experiment Comparisons

Author Spotlight: Alignment of Synchronized Time-Series Data Using the Characterizing Loss of Cell Cycle Synchrony Model for Cross-Experiment Comparisons

Published on: June 9, 2023

A Method of Trigonometric Modelling of Seasonal Variation Demonstrated with Multiple Sclerosis Relapse Data

A Method of Trigonometric Modelling of Seasonal Variation Demonstrated with Multiple Sclerosis Relapse Data

Published on: December 9, 2015

Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach

Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach

Published on: July 3, 2020

Area of Science:

Healthcare data science
Biostatistics
Machine learning in medicine

Background:

Missing data is a significant challenge in healthcare analytics.
Current imputation methods are often evaluated on unrealistic missing data patterns.
Real-world missingness mechanisms (MCAR, MAR, NMAR) require robust imputation strategies.

Purpose of the Study:

To assess the real-world accuracy of 12 imputation methods across three missing data mechanisms (MCAR, MAR, NMAR).
To compare imputation performance on continuous glucose monitoring and heart rate time series data.
To evaluate the impact of missingness percentages (5-30%) on imputation accuracy.

Main Methods:

Simulated missingness in Loop (CGM) and All of Us (heart rate) datasets according to MCAR, MAR, and NMAR mechanisms.
Tested 12 state-of-the-art and commonly used imputation methods.
Evaluated accuracy using root mean square error (RMSE) and bias metrics across demographic groups.

Main Results:

Imputation accuracy was significantly higher for missing completely at random (MCAR) data compared to missing at random (MAR) and not missing at random (NMAR) data.
Linear interpolation demonstrated the lowest RMSE and minimal bias across all tested mechanisms and demographic groups.
Existing evaluation practices may overestimate imputation method performance in real-world scenarios.

Conclusions:

Current imputation method evaluations do not reflect real-world performance with realistic missing data patterns.
Linear interpolation offers a reliable baseline for imputation, even with complex missingness.
Further research should focus on developing improved evaluation methodologies and imputation techniques tailored to real-world missing data mechanisms.