Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Data Validation01:15

Data Validation

278
Method validation is a crucial process in analytical chemistry designed to confirm that a given method consistently produces reliable and high-quality results. This process is essential when a method is applied to different sample matrices or when procedural modifications are made, ensuring that the results meet acceptable standards across various applications.
Key parameters for method validation include:
278
Statistical Methods for Analyzing Epidemiological Data01:25

Statistical Methods for Analyzing Epidemiological Data

579
Epidemiological data primarily involves information on specific populations' occurrence, distribution, and determinants of health and diseases. This data is crucial for understanding disease patterns and impacts, aiding public health decision-making and disease prevention strategies. The analysis of epidemiological data employs various statistical methods to interpret health-related data effectively. Here are some commonly used methods:
579
Kaplan-Meier Approach01:24

Kaplan-Meier Approach

288
The Kaplan-Meier estimator is a non-parametric method used to estimate the survival function from time-to-event data. In medical research, it is frequently employed to measure the proportion of patients surviving for a certain period after treatment. This estimator is fundamental in analyzing time-to-event data, making it indispensable in clinical trials, epidemiological studies, and reliability engineering. By estimating survival probabilities, researchers can evaluate treatment effectiveness,...
288
Statistical Analysis: Overview01:11

Statistical Analysis: Overview

8.2K
When we take repeated measurements on the same or replicated samples, we will observe inconsistencies in the magnitude. These inconsistencies are called errors. To categorize and characterize these results and their errors, the researcher can use statistical analysis to determine the quality of the measurements and/or suitability of the methods.
One of the most commonly used statistical quantifiers is the mean, which is the ratio between the sum of the numerical values of all results and the...
8.2K
Comparing the Survival Analysis of Two or More Groups01:20

Comparing the Survival Analysis of Two or More Groups

319
Survival analysis is a cornerstone of medical research, used to evaluate the time until an event of interest occurs, such as death, disease recurrence, or recovery. Unlike standard statistical methods, survival analysis is particularly adept at handling censored data—instances where the event has not occurred for some participants by the end of the study or remains unobserved. To address these unique challenges, specialized techniques like the Kaplan-Meier estimator, log-rank test, and...
319
Analysis of Population Pharmacokinetic Data01:12

Analysis of Population Pharmacokinetic Data

414
Analysis of population pharmacokinetic data involves studying the behavior of drugs within diverse populations to understand their pharmacokinetic parameters. Traditional pharmacokinetic methods typically involve collecting samples from a few individuals and estimating these parameters. While these methods are commonly used, they have limitations in capturing the variability in drug response among individuals or heterogeneous populations. Population pharmacokinetics is employed to address these...
414

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

ECCO Guidelines on Therapeutics in Ulcerative Colitis: Surgical Treatment.

Journal of Crohn's & colitis·2026
Same author

An Evaluation of Pretrained Generative Models for Augmenting Small Health Data: Comparative Modeling Study.

Journal of medical Internet research·2026
Same author

Evaluating the feasibility of a scalable, digitally supported model for global collaborative surgical research: protocol for a prospective, international, multicentre observational study in cryptoglandular anal fistula treatment (CRAFT).

BMJ open·2026
Same author

Transfer Learning and Machine Learning for Training Five-Year Survival Prognostic Models in Early Breast Cancer: Development and Validation Study.

Journal of medical Internet research·2026
Same author

Applying Machine Learning to Predict Complex Clinical Course in Youth With Eating Disorders.

The International journal of eating disorders·2025
Same author

Should we synthesize more than we need: impact of synthetic data generation for high-dimensional cross-sectional medical data.

Journal of the American Medical Informatics Association : JAMIA·2025
Same journal

Pregnancy-Related Clinical Codes in Unlikely Populations in Primary Care.

JMIR medical informatics·2026
Same journal

Selecting, Scaling, and Measuring the Value of Ambient AI in a Nonacademic Health System: Multiphase Pilot Study.

JMIR medical informatics·2026
Same journal

Prediction of Early Hospital Admission (≤24 Hours) After Stroke Using Machine Learning and Deep Learning: Multicenter Study From China.

JMIR medical informatics·2026
Same journal

Assessing the Feasibility and Acceptability of Implementing a Preclinic Vital Signs Assessment in Primary Care: Cross-Sectional Pilot Study.

JMIR medical informatics·2026
Same journal

Candidate Passive Sensor Suite Technologies for Tactical Combat Casualty Care Environments: Comparative Assessment Study.

JMIR medical informatics·2026
Same journal

Relevance of the uMap Collaborative Platform as Support for Choropleth Mapping: A Traffic‒Light Statistical Signal Atlas of All-Cause Mortality-First French Lockdown.

JMIR medical informatics·2026
See all related articles

Related Experiment Video

Updated: Sep 27, 2025

Assessing the Accuracy of Fitness Smartwatch Data for Cardiovascular and Physical Activity Monitoring: A Validation Study in Digital Health
05:51

Assessing the Accuracy of Fitness Smartwatch Data for Cardiovascular and Physical Activity Monitoring: A Validation Study in Digital Health

Published on: February 21, 2025

690

Utility Metrics for Evaluating Synthetic Health Data Generation Methods: Validation Study.

Khaled El Emam1,2,3, Lucy Mosquera2,3, Xi Fang3

  • 1School of Epidemiology and Public Health, University of Ottawa, Ottawa, ON, Canada.

JMIR Medical Informatics
|April 7, 2022
PubMed
Summary
This summary is machine-generated.

Evaluating synthetic data generation (SDG) methods requires validated utility metrics. The multivariate Hellinger distance effectively ranks SDG methods for logistic regression prediction models in health research.

Keywords:
binary prediction modeldata privacydata utilitygenerative modelslogistic regressionmedical informaticsmodel validationprediction modelsynthetic datasynthetic data generationutility metric

More Related Videos

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index
06:55

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index

Published on: January 8, 2020

14.7K
Setup of Consumer Wearable Devices for Exposure and Health Monitoring in Population Studies
15:00

Setup of Consumer Wearable Devices for Exposure and Health Monitoring in Population Studies

Published on: February 3, 2023

2.7K

Related Experiment Videos

Last Updated: Sep 27, 2025

Assessing the Accuracy of Fitness Smartwatch Data for Cardiovascular and Physical Activity Monitoring: A Validation Study in Digital Health
05:51

Assessing the Accuracy of Fitness Smartwatch Data for Cardiovascular and Physical Activity Monitoring: A Validation Study in Digital Health

Published on: February 21, 2025

690
Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index
06:55

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index

Published on: January 8, 2020

14.7K
Setup of Consumer Wearable Devices for Exposure and Health Monitoring in Population Studies
15:00

Setup of Consumer Wearable Devices for Exposure and Health Monitoring in Population Studies

Published on: February 3, 2023

2.7K

Area of Science:

  • Data Science
  • Machine Learning
  • Health Informatics

Background:

  • Evaluating synthetic data generation (SDG) methods is crucial for developers and users.
  • Existing utility metrics lack general validation for comparing SDG methods.

Purpose of the Study:

  • Assess common utility metrics' ability to rank SDG methods.
  • Focus on logistic regression prediction models for health research.

Main Methods:

  • Evaluated 6 utility metrics across 30 health datasets and 3 SDG methods.
  • Ranked methods based on prediction performance using logistic regression models.
  • Calculated performance by comparing synthetic vs. real data AUC-ROC and AUC-PR.

Main Results:

  • Multivariate Hellinger distance, using Gaussian copula, best ranked SDG methods.
  • This metric demonstrated superior ability to differentiate SDG method performance.

Conclusions:

  • Validated multivariate Hellinger distance as a reliable SDG method utility metric.
  • This metric enables effective evaluation and comparison of competing SDG methods.