Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Cluster Sampling Method01:20

Cluster Sampling Method

13.9K
Appropriate sampling methods ensure that samples are drawn without bias and accurately represent the population. Because measuring the entire population in a study is not practical, researchers use samples to represent the population of interest.
To choose a cluster sample, divide the population into clusters (groups) and then randomly select some of the clusters. All the members from these clusters are in the cluster sample. For example, if you randomly sample four departments from your...
13.9K
Statistical Methods for Analyzing Epidemiological Data01:25

Statistical Methods for Analyzing Epidemiological Data

858
Epidemiological data primarily involves information on specific populations' occurrence, distribution, and determinants of health and diseases. This data is crucial for understanding disease patterns and impacts, aiding public health decision-making and disease prevention strategies. The analysis of epidemiological data employs various statistical methods to interpret health-related data effectively. Here are some commonly used methods:
858
Classification of Illness01:17

Classification of Illness

8.5K
The meaning of illness is individualized to each person who experiences an alteration in health. In contrast, disease is a medical term indicating a pathological change in the structure and function of the body or mind. It is a condition that has specific symptoms and boundaries.
An illness is a response to a disease in which the person's level of functioning is changed compared with a previous level. The general classification of illness includes acute and chronic.
Acute illness is severe...
8.5K
Model Approaches for Pharmacokinetic Data: Distributed Parameter Models01:06

Model Approaches for Pharmacokinetic Data: Distributed Parameter Models

218
Pharmacokinetic models are mathematical constructs that represent and predict the time course of drug concentrations in the body, providing meaningful pharmacokinetic parameters. These models are categorized into compartment, physiological, and distributed parameter models.
The distributed parameter models are specifically designed to account for variations and differences in some drug classes. This model is particularly useful for assessing regional concentrations of anticancer or...
218
Estimating Population Mean with Unknown Standard Deviation01:22

Estimating Population Mean with Unknown Standard Deviation

8.7K
In practice, we rarely know the population standard deviation. In the past, when the sample size was large, this did not present a problem to statisticians. They used the sample standard deviation s as an estimate for σ and proceeded as before to calculate a confidence interval with close enough results. However, statisticians ran into problems when the sample size was small. A small sample size caused inaccuracies in the confidence interval.
William S. Gosset (1876–1937) of the...
8.7K
Statistical Software for Data Analysis and Clinical Trials01:12

Statistical Software for Data Analysis and Clinical Trials

1.4K
Statistical software is pivotal in data analysis and clinical trials by providing tools to analyze data, draw conclusions, and make predictions. These software packages range from simple data management applications to complex analytical platforms, supporting various statistical tests, models, and simulation techniques. Their significance lies in their ability to handle vast amounts of data with precision and efficiency, enabling researchers to validate hypotheses, identify trends, and make...
1.4K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Long-term Outcomes After Resection of Solitary Colorectal Liver Metastases.

Annals of surgical oncology·2026
Same author

Immune checkpoint inhibitor therapy after tumor-infiltrating lymphocytes in unresectable melanoma.

Journal for immunotherapy of cancer·2026
Same author

Deconvolving SARS-CoV-2 mRNA vaccine impact on immunotherapy-related survival.

Cancer discovery·2026
Same author

Diagnostic delay in histiocytic neoplasms and its association with local resource deprivation.

Haematologica·2026
Same author

ASO Visual Abstract: Long-Term Outcomes After Resection of Solitary Colorectal Liver Metastases.

Annals of surgical oncology·2026
Same author

Enhanced Telehealth in Prostate Cancer.

JAMA network open·2026
Same journal

Interpretable Bayesian Modeling for Multireader Multicase Studies: Addressing Overdispersion and Limited Sample Size in Diagnostic Enhancement Evaluation.

Statistics in medicine·2026
Same journal

Adaptive Sequential Multiple Hypotheses Testing for Concomitant Vaccine Safety Surveillance.

Statistics in medicine·2026
Same journal

Novel Distance Regression for Repeated Outcomes With Missing Data: Applications to Longitudinal and Crossover Studies of Microbiome Beta-Diversity.

Statistics in medicine·2026
Same journal

Optimal Weighted Tests for Replication Studies and the 'Two-Trials Rule' With Multiple Hypotheses.

Statistics in medicine·2026
Same journal

Identifiable Copula-Double-Cox Models: A Fully Parametric Framework for Dependent Right-Censored Survival Data.

Statistics in medicine·2026
Same journal

Moving From Individualized Risk-Based Prevention to Benefit-Based Prevention: Estimating Individualized Life-Years Gained From Prevention Services as a Basis for Eligibility.

Statistics in medicine·2026
See all related articles

Related Experiment Video

Updated: Jan 6, 2026

A Data-Driven Approach to Quantifying Immune States in Sepsis
07:42

A Data-Driven Approach to Quantifying Immune States in Sepsis

Published on: February 7, 2025

457

Clustering-Informed Shared-Structure Variational Autoencoder for Missing Data Imputation in Large-Scale Healthcare

Yasin Khadem Charvadeh1, Kenneth Seier1, Katherine S Panageas1

  • 1Department of Epidemiology & Biostatistics, Memorial Sloan Kettering Cancer Center, New York, New York, USA.

Statistics in Medicine
|December 3, 2025
PubMed
Summary
This summary is machine-generated.

We introduce a new method, the clustering-informed shared-structure variational autoencoder (CISS-VAE), to accurately impute missing data in electronic health records (EHR). This advanced technique improves healthcare analytics by handling complex data relationships and various missing data types.

Keywords:
electronic health recordsmissing data imputationmissing not at randomvariational autoencoder

More Related Videos

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness
03:14

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

980
Identification of Disease-related Spatial Covariance Patterns using Neuroimaging Data
14:27

Identification of Disease-related Spatial Covariance Patterns using Neuroimaging Data

Published on: June 26, 2013

16.1K

Related Experiment Videos

Last Updated: Jan 6, 2026

A Data-Driven Approach to Quantifying Immune States in Sepsis
07:42

A Data-Driven Approach to Quantifying Immune States in Sepsis

Published on: February 7, 2025

457
Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness
03:14

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

980
Identification of Disease-related Spatial Covariance Patterns using Neuroimaging Data
14:27

Identification of Disease-related Spatial Covariance Patterns using Neuroimaging Data

Published on: June 26, 2013

16.1K

Area of Science:

  • Health Informatics
  • Machine Learning
  • Biostatistics

Background:

  • Missing data in electronic health records (EHR) and patient-reported outcomes hinders healthcare analytics.
  • Conventional imputation methods fail to capture complex nonlinear relationships and various missing data mechanisms, including missing not at random (MNAR).

Purpose of the Study:

  • To develop an advanced imputation method that effectively addresses the challenges of missing data in healthcare analytics.
  • To improve the accuracy and usability of EHR and patient-reported outcome data for health monitoring and analysis.

Main Methods:

  • Proposed the clustering-informed shared-structure variational autoencoder (CISS-VAE), a Bayesian neural network model.
  • Developed iterative learning algorithms to enhance imputation accuracy and prevent overfitting.
  • Validated the model through comprehensive simulations and application to real-world EHR data.

Main Results:

  • The CISS-VAE model demonstrated superior accuracy compared to traditional and contemporary imputation methods in simulations.
  • The model effectively captures complex associations and accommodates various missing data mechanisms, including MNAR.
  • Successful application to EHR data from early-stage breast cancer patients.

Conclusions:

  • The CISS-VAE model offers a powerful solution for mitigating the impact of missing data in healthcare analytics.
  • This approach enhances the reliability of health monitoring and analyses using EHR data.
  • The proposed method advances the field of health informatics by improving data imputation techniques.