Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Statistical Software for Data Analysis and Clinical Trials

Statistical Software for Data Analysis and Clinical Trials

Statistical software is pivotal in data analysis and clinical trials by providing tools to analyze data, draw conclusions, and make predictions. These software packages range from simple data management applications to complex analytical platforms, supporting various statistical tests, models, and simulation techniques. Their significance lies in their ability to handle vast amounts of data with precision and efficiency, enabling researchers to validate hypotheses, identify trends, and make...

Genome-wide Association Studies-GWAS

Genome-wide Association Studies-GWAS

Genome-wide association studies or GWAS are used to identify whether common SNPs are associated with certain diseases. Suppose specific SNPs are more frequently observed in individuals with a particular disease than those without the disease. In that case, those SNPs are said to be associated with the disease. Chi-square analysis is performed to check the probability of the allele likely to be associated with the disease.
GWAS does not require the identification of the target gene involved in...

Improving Translational Accuracy

Improving Translational Accuracy

Base complementarity between the three base pairs of mRNA codon and the tRNA anticodon is not a failsafe mechanism. Inaccuracies can range from a single mismatch to no correct base pairing at all. The free energy difference between the correct and nearly correct base pairs can be as small as 3 kcal/ mol. With complementarity being the only proofreading step, the estimated error frequency would be one wrong amino acid in every 100 amino acids incorporated. However, error frequencies observed in...

Improving Translational Accuracy

Improving Translational Accuracy

Bias in Epidemiological Studies

Bias in Epidemiological Studies

Biases can arise at various stages of research, from study design and data collection to analysis and interpretation. Recognizing and addressing these biases is essential to ensure the validity and reliability of epidemiological findings.Broadly speaking, biases in epidemiology fall into three main categories: selection bias, information bias, and confounding. A more detailed description of possible biases is:

Statistical Methods for Analyzing Epidemiological Data

Statistical Methods for Analyzing Epidemiological Data

Epidemiological data primarily involves information on specific populations' occurrence, distribution, and determinants of health and diseases. This data is crucial for understanding disease patterns and impacts, aiding public health decision-making and disease prevention strategies. The analysis of epidemiological data employs various statistical methods to interpret health-related data effectively. Here are some commonly used methods:

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Interventions for managing clinically relevant sleep disturbances or insomnia in cancer patients and survivors: an up-to-date systematic review and meta-analysis of self-reported sleep disturbance.

Frontiers in psychology·2026

Same author

Association of Reduction in Continuity of Care During COVID-19 Pandemic With Cardiovascular Diseases, Kidney Failure and All-Cause Mortality for People With Diabetes: A Cohort Study in Hong Kong.

Diabetes, obesity & metabolism·2026

Same author

The optimal blood pressure target in old and very old patients with hypertension.

Age and ageing·2026

Same author

Optimal blood pressure target in patients with uncomplicated hypertension: a target trial emulation study.

Nature communications·2026

Same author

Development and usability testing of 'Eating Smart' - A mobile application for promoting healthy eating in Chinese colorectal cancer survivors and high-risk populations.

International journal of medical informatics·2026

Same author

Indirect effect of the COVID-19 pandemic on mortality, complications, and healthcare utilization among people with Chronic Kidney Disease in Hong Kong: an interrupted time series analysis.

Journal of nephrology·2026

Same journal

Methods for incorporating test result information within the high-dimensional propensity score framework: application in UK electronic health record data.

BMC medical research methodology·2026

Same journal

Sparse multi-way DMDC for longitudinal classification in high dimension low sample size data.

BMC medical research methodology·2026

Same journal

Tree-based exploratory identification of predictive biomarkers in non-randomized data.

BMC medical research methodology·2026

Same journal

Comparative evaluation of interrupted time series analytical methods for healthcare quality improvement research: a Monte Carlo simulation study.

BMC medical research methodology·2026

Same journal

Methodological advances in claims-based dementia algorithms: integrating medication and clinical data for medicare populations.

BMC medical research methodology·2026

Same journal

An interpretable XGboost algorithm for predicting 30-day mortality in acute pancreatitis using routine biomarkers.

BMC medical research methodology·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Nov 8, 2025

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

Generative adversarial networks for imputing missing data for big data clinical research.

Weinan Dong¹, Daniel Yee Tak Fong², Jin-Sun Yoon³

¹Department of Family Medicine and Primary Care, Faculty of Medicine, University of Hong Kong, Hong Kong, Hong Kong SAR, China.

BMC Medical Research Methodology

|April 21, 2021

Summary

This summary is machine-generated.

Generative adversarial imputation nets (GAIN) accurately impute missing clinical data, outperforming MICE and missForest, especially with high missingness. GAIN offers a faster, more efficient solution for big data research.

Keywords:

Big data Clinical research Generative adversarial network Machine learning Missing data imputation

More Related Videos

Author Spotlight: Advancing Alzheimer's Research – Exploring Early Detection and Multi-Omics Approaches

Author Spotlight: Advancing Alzheimer's Research – Exploring Early Detection and Multi-Omics Approaches

Published on: December 15, 2023

Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems

Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems

Published on: June 13, 2025

Related Experiment Videos

Last Updated: Nov 8, 2025

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

Author Spotlight: Advancing Alzheimer's Research – Exploring Early Detection and Multi-Omics Approaches

Author Spotlight: Advancing Alzheimer's Research – Exploring Early Detection and Multi-Omics Approaches

Published on: December 15, 2023

Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems

Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems

Published on: June 13, 2025

Area of Science:

Machine Learning in Clinical Research
Data Imputation Techniques
Big Data Analytics

Background:

Missing data is a significant challenge in clinical research.
Generative Adversarial Imputation Nets (GAIN) is a novel machine learning approach for data imputation.
GAIN's efficacy in large, real-world clinical datasets remains unevaluated.

Purpose of the Study:

Evaluate the accuracy of GAIN for imputing missing values in large clinical datasets with mixed variable types.
Assess the computational efficiency of GAIN.
Compare GAIN's performance against MICE and missForest.

Main Methods:

Utilized two large real-world clinical datasets (diabetes and hypertension cohorts).
Simulated missing at random data at 20% and 50% missingness rates.
Measured imputation accuracy using Normalized Root Mean Square Error (NRMSE) and Proportion of Falsely Classified (PFC).
Recorded computation time for each imputation method.

Main Results:

GAIN and missForest were more accurate than MICE.
GAIN demonstrated superior accuracy over missForest at 50% missingness.
GAIN excelled in imputing skewed continuous and imbalanced categorical variables.
GAIN exhibited significantly faster computation times (32 min vs. 1300 min for missForest on 50,000 cases).

Conclusions:

GAIN is a more accurate and efficient imputation method for missing data in large clinical datasets compared to MICE and missForest.
GAIN is robust to high missingness rates (up to 50%).
GAIN's speed and accuracy make it a promising tool for big clinical data research.