Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Statistical Software for Data Analysis and Clinical Trials01:12

Statistical Software for Data Analysis and Clinical Trials

1.1K
Statistical software is pivotal in data analysis and clinical trials by providing tools to analyze data, draw conclusions, and make predictions. These software packages range from simple data management applications to complex analytical platforms, supporting various statistical tests, models, and simulation techniques. Their significance lies in their ability to handle vast amounts of data with precision and efficiency, enabling researchers to validate hypotheses, identify trends, and make...
1.1K
Genome-wide Association Studies-GWAS01:11

Genome-wide Association Studies-GWAS

14.9K
Genome-wide association studies or GWAS are used to identify whether common SNPs are associated with certain diseases. Suppose specific SNPs are more frequently observed in individuals with a particular disease than those without the disease. In that case, those SNPs are said to be associated with the disease. Chi-square analysis is performed to check the probability of the allele likely to be associated with the disease.
GWAS does not require the identification of the target gene involved in...
14.9K
Improving Translational Accuracy02:07

Improving Translational Accuracy

12.1K
Base complementarity between the three base pairs of mRNA codon and the tRNA anticodon is not a failsafe mechanism. Inaccuracies can range from a single mismatch to no correct base pairing at all. The free energy difference between the correct and nearly correct base pairs can be as small as 3 kcal/ mol. With complementarity being the only proofreading step, the estimated error frequency would be one wrong amino acid in every 100 amino acids incorporated. However, error frequencies observed in...
12.1K
Improving Translational Accuracy02:07

Improving Translational Accuracy

3.2K
3.2K
Bias in Epidemiological Studies01:29

Bias in Epidemiological Studies

919
Biases can arise at various stages of research, from study design and data collection to analysis and interpretation. Recognizing and addressing these biases is essential to ensure the validity and reliability of epidemiological findings.Broadly speaking, biases in epidemiology fall into three main categories: selection bias, information bias, and confounding. A more detailed description of possible biases is:  
919
Statistical Methods for Analyzing Epidemiological Data01:25

Statistical Methods for Analyzing Epidemiological Data

678
Epidemiological data primarily involves information on specific populations' occurrence, distribution, and determinants of health and diseases. This data is crucial for understanding disease patterns and impacts, aiding public health decision-making and disease prevention strategies. The analysis of epidemiological data employs various statistical methods to interpret health-related data effectively. Here are some commonly used methods:
678

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Interventions for managing clinically relevant sleep disturbances or insomnia in cancer patients and survivors: an up-to-date systematic review and meta-analysis of self-reported sleep disturbance.

Frontiers in psychology·2026
Same author

Association of Reduction in Continuity of Care During COVID-19 Pandemic With Cardiovascular Diseases, Kidney Failure and All-Cause Mortality for People With Diabetes: A Cohort Study in Hong Kong.

Diabetes, obesity & metabolism·2026
Same author

The optimal blood pressure target in old and very old patients with hypertension.

Age and ageing·2026
Same author

Optimal blood pressure target in patients with uncomplicated hypertension: a target trial emulation study.

Nature communications·2026
Same author

Development and usability testing of 'Eating Smart' - A mobile application for promoting healthy eating in Chinese colorectal cancer survivors and high-risk populations.

International journal of medical informatics·2026
Same author

Indirect effect of the COVID-19 pandemic on mortality, complications, and healthcare utilization among people with Chronic Kidney Disease in Hong Kong: an interrupted time series analysis.

Journal of nephrology·2026
Same journal

Methods for incorporating test result information within the high-dimensional propensity score framework: application in UK electronic health record data.

BMC medical research methodology·2026
Same journal

Sparse multi-way DMDC for longitudinal classification in high dimension low sample size data.

BMC medical research methodology·2026
Same journal

Tree-based exploratory identification of predictive biomarkers in non-randomized data.

BMC medical research methodology·2026
Same journal

Comparative evaluation of interrupted time series analytical methods for healthcare quality improvement research: a Monte Carlo simulation study.

BMC medical research methodology·2026
Same journal

Methodological advances in claims-based dementia algorithms: integrating medication and clinical data for medicare populations.

BMC medical research methodology·2026
Same journal

An interpretable XGboost algorithm for predicting 30-day mortality in acute pancreatitis using routine biomarkers.

BMC medical research methodology·2026
See all related articles

Related Experiment Video

Updated: Nov 8, 2025

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness
03:14

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

820

Generative adversarial networks for imputing missing data for big data clinical research.

Weinan Dong1, Daniel Yee Tak Fong2, Jin-Sun Yoon3

  • 1Department of Family Medicine and Primary Care, Faculty of Medicine, University of Hong Kong, Hong Kong, Hong Kong SAR, China.

BMC Medical Research Methodology
|April 21, 2021
PubMed
Summary
This summary is machine-generated.

Generative adversarial imputation nets (GAIN) accurately impute missing clinical data, outperforming MICE and missForest, especially with high missingness. GAIN offers a faster, more efficient solution for big data research.

Keywords:
Big dataClinical researchGenerative adversarial networkMachine learningMissing data imputation

More Related Videos

Author Spotlight: Advancing Alzheimer's Research – Exploring Early Detection and Multi-Omics Approaches
09:47

Author Spotlight: Advancing Alzheimer's Research – Exploring Early Detection and Multi-Omics Approaches

Published on: December 15, 2023

1.4K
Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems
05:47

Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems

Published on: June 13, 2025

855

Related Experiment Videos

Last Updated: Nov 8, 2025

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness
03:14

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

820
Author Spotlight: Advancing Alzheimer's Research – Exploring Early Detection and Multi-Omics Approaches
09:47

Author Spotlight: Advancing Alzheimer's Research – Exploring Early Detection and Multi-Omics Approaches

Published on: December 15, 2023

1.4K
Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems
05:47

Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems

Published on: June 13, 2025

855

Area of Science:

  • Machine Learning in Clinical Research
  • Data Imputation Techniques
  • Big Data Analytics

Background:

  • Missing data is a significant challenge in clinical research.
  • Generative Adversarial Imputation Nets (GAIN) is a novel machine learning approach for data imputation.
  • GAIN's efficacy in large, real-world clinical datasets remains unevaluated.

Purpose of the Study:

  • Evaluate the accuracy of GAIN for imputing missing values in large clinical datasets with mixed variable types.
  • Assess the computational efficiency of GAIN.
  • Compare GAIN's performance against MICE and missForest.

Main Methods:

  • Utilized two large real-world clinical datasets (diabetes and hypertension cohorts).
  • Simulated missing at random data at 20% and 50% missingness rates.
  • Measured imputation accuracy using Normalized Root Mean Square Error (NRMSE) and Proportion of Falsely Classified (PFC).
  • Recorded computation time for each imputation method.

Main Results:

  • GAIN and missForest were more accurate than MICE.
  • GAIN demonstrated superior accuracy over missForest at 50% missingness.
  • GAIN excelled in imputing skewed continuous and imbalanced categorical variables.
  • GAIN exhibited significantly faster computation times (32 min vs. 1300 min for missForest on 50,000 cases).

Conclusions:

  • GAIN is a more accurate and efficient imputation method for missing data in large clinical datasets compared to MICE and missForest.
  • GAIN is robust to high missingness rates (up to 50%).
  • GAIN's speed and accuracy make it a promising tool for big clinical data research.