Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Cluster Sampling Method

Cluster Sampling Method

Appropriate sampling methods ensure that samples are drawn without bias and accurately represent the population. Because measuring the entire population in a study is not practical, researchers use samples to represent the population of interest.
To choose a cluster sample, divide the population into clusters (groups) and then randomly select some of the clusters. All the members from these clusters are in the cluster sample. For example, if you randomly sample four departments from your...

Statistical Methods for Analyzing Epidemiological Data

Statistical Methods for Analyzing Epidemiological Data

Epidemiological data primarily involves information on specific populations' occurrence, distribution, and determinants of health and diseases. This data is crucial for understanding disease patterns and impacts, aiding public health decision-making and disease prevention strategies. The analysis of epidemiological data employs various statistical methods to interpret health-related data effectively. Here are some commonly used methods:

Classification of Illness

Classification of Illness

The meaning of illness is individualized to each person who experiences an alteration in health. In contrast, disease is a medical term indicating a pathological change in the structure and function of the body or mind. It is a condition that has specific symptoms and boundaries.
An illness is a response to a disease in which the person's level of functioning is changed compared with a previous level. The general classification of illness includes acute and chronic.
Acute illness is severe...

Model Approaches for Pharmacokinetic Data: Distributed Parameter Models

Model Approaches for Pharmacokinetic Data: Distributed Parameter Models

Pharmacokinetic models are mathematical constructs that represent and predict the time course of drug concentrations in the body, providing meaningful pharmacokinetic parameters. These models are categorized into compartment, physiological, and distributed parameter models.
The distributed parameter models are specifically designed to account for variations and differences in some drug classes. This model is particularly useful for assessing regional concentrations of anticancer or...

Estimating Population Mean with Unknown Standard Deviation

Estimating Population Mean with Unknown Standard Deviation

In practice, we rarely know the population standard deviation. In the past, when the sample size was large, this did not present a problem to statisticians. They used the sample standard deviation s as an estimate for σ and proceeded as before to calculate a confidence interval with close enough results. However, statisticians ran into problems when the sample size was small. A small sample size caused inaccuracies in the confidence interval.
William S. Gosset (1876–1937) of the...

Statistical Software for Data Analysis and Clinical Trials

Statistical Software for Data Analysis and Clinical Trials

Statistical software is pivotal in data analysis and clinical trials by providing tools to analyze data, draw conclusions, and make predictions. These software packages range from simple data management applications to complex analytical platforms, supporting various statistical tests, models, and simulation techniques. Their significance lies in their ability to handle vast amounts of data with precision and efficiency, enabling researchers to validate hypotheses, identify trends, and make...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Long-term Outcomes After Resection of Solitary Colorectal Liver Metastases.

Annals of surgical oncology·2026

Same author

Immune checkpoint inhibitor therapy after tumor-infiltrating lymphocytes in unresectable melanoma.

Journal for immunotherapy of cancer·2026

Same author

Deconvolving SARS-CoV-2 mRNA vaccine impact on immunotherapy-related survival.

Cancer discovery·2026

Same author

Diagnostic delay in histiocytic neoplasms and its association with local resource deprivation.

Haematologica·2026

Same author

ASO Visual Abstract: Long-Term Outcomes After Resection of Solitary Colorectal Liver Metastases.

Annals of surgical oncology·2026

Same author

Enhanced Telehealth in Prostate Cancer.

JAMA network open·2026

Same journal

Interpretable Bayesian Modeling for Multireader Multicase Studies: Addressing Overdispersion and Limited Sample Size in Diagnostic Enhancement Evaluation.

Statistics in medicine·2026

Same journal

Adaptive Sequential Multiple Hypotheses Testing for Concomitant Vaccine Safety Surveillance.

Statistics in medicine·2026

Same journal

Novel Distance Regression for Repeated Outcomes With Missing Data: Applications to Longitudinal and Crossover Studies of Microbiome Beta-Diversity.

Statistics in medicine·2026

Same journal

Optimal Weighted Tests for Replication Studies and the 'Two-Trials Rule' With Multiple Hypotheses.

Statistics in medicine·2026

Same journal

Identifiable Copula-Double-Cox Models: A Fully Parametric Framework for Dependent Right-Censored Survival Data.

Statistics in medicine·2026

Same journal

Moving From Individualized Risk-Based Prevention to Benefit-Based Prevention: Estimating Individualized Life-Years Gained From Prevention Services as a Basis for Eligibility.

Statistics in medicine·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Jan 6, 2026

A Data-Driven Approach to Quantifying Immune States in Sepsis

A Data-Driven Approach to Quantifying Immune States in Sepsis

Published on: February 7, 2025

Clustering-Informed Shared-Structure Variational Autoencoder for Missing Data Imputation in Large-Scale Healthcare

Yasin Khadem Charvadeh¹, Kenneth Seier¹, Katherine S Panageas¹

¹Department of Epidemiology & Biostatistics, Memorial Sloan Kettering Cancer Center, New York, New York, USA.

Statistics in Medicine

|December 3, 2025

Summary

This summary is machine-generated.

We introduce a new method, the clustering-informed shared-structure variational autoencoder (CISS-VAE), to accurately impute missing data in electronic health records (EHR). This advanced technique improves healthcare analytics by handling complex data relationships and various missing data types.

Keywords:

electronic health records missing data imputation missing not at random variational autoencoder

More Related Videos

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

Identification of Disease-related Spatial Covariance Patterns using Neuroimaging Data

Identification of Disease-related Spatial Covariance Patterns using Neuroimaging Data

Published on: June 26, 2013

Related Experiment Videos

Last Updated: Jan 6, 2026

A Data-Driven Approach to Quantifying Immune States in Sepsis

A Data-Driven Approach to Quantifying Immune States in Sepsis

Published on: February 7, 2025

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

Identification of Disease-related Spatial Covariance Patterns using Neuroimaging Data

Identification of Disease-related Spatial Covariance Patterns using Neuroimaging Data

Published on: June 26, 2013

Area of Science:

Health Informatics
Machine Learning
Biostatistics

Background:

Missing data in electronic health records (EHR) and patient-reported outcomes hinders healthcare analytics.
Conventional imputation methods fail to capture complex nonlinear relationships and various missing data mechanisms, including missing not at random (MNAR).

Purpose of the Study:

To develop an advanced imputation method that effectively addresses the challenges of missing data in healthcare analytics.
To improve the accuracy and usability of EHR and patient-reported outcome data for health monitoring and analysis.

Main Methods:

Proposed the clustering-informed shared-structure variational autoencoder (CISS-VAE), a Bayesian neural network model.
Developed iterative learning algorithms to enhance imputation accuracy and prevent overfitting.
Validated the model through comprehensive simulations and application to real-world EHR data.

Main Results:

The CISS-VAE model demonstrated superior accuracy compared to traditional and contemporary imputation methods in simulations.
The model effectively captures complex associations and accommodates various missing data mechanisms, including MNAR.
Successful application to EHR data from early-stage breast cancer patients.

Conclusions:

The CISS-VAE model offers a powerful solution for mitigating the impact of missing data in healthcare analytics.
This approach enhances the reliability of health monitoring and analyses using EHR data.
The proposed method advances the field of health informatics by improving data imputation techniques.