Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Strategies for Assessing and Addressing Confounding

Strategies for Assessing and Addressing Confounding

Confounding is a critical issue in epidemiological studies, often leading to misleading conclusions about associations between exposures and outcomes. It occurs when the relationship between the exposure and the outcome is mixed with the effects of other factors that influence the outcome. Given that, addressing confounding is of high importance for drawing accurate inferences in research.
Confounding can be addressed at both the design phase of a study and through analytical methods after data...

Confounding in Epidemiological Studies

Confounding in Epidemiological Studies

Confounding in statistical epidemiology represents a pivotal challenge, referring to the distortion in the perceived relationship between an exposure and an outcome due to the presence of a third variable, known as a confounder. This variable is associated with both the exposure and the outcome but is not a direct link in their causal chain. Its presence can lead to erroneous interpretations of the exposure's effect, either exaggerating or underestimating the true association. This...

Blinding

Blinding

Blinding is a commonly used method of not telling participants which treatment a subject is receiving. Blinding is a critical part of a randomized control trial or RCT. It reduces the bias that affects the results. In an RCT, blinding is used in the form of a placebo. A placebo effect occurs when untreated subjects falsely believe they have received the treatment and report improved symptoms. A placebo or a dummy treatment is administered to subjects to negate the bias caused by such an effect.

Censoring Survival Data

Censoring Survival Data

Survival analysis is a statistical method used to analyze time-to-event data, often employed in fields such as medicine, engineering, and social sciences. One of the key challenges in survival analysis is dealing with incomplete data, a phenomenon known as "censoring." Censoring occurs when the event of interest (such as death, relapse, or system failure) has not occurred for some individuals by the end of the study period or is otherwise unobservable, and it might have many different...

Confidence Intervals

Confidence Intervals

An unbiased point estimate is often insufficient to predict a population estimate, such as population mean or population proportion. In this scenario, a confidence interval is used. A confidence interval is an estimate similar to a sample proportion. However, unlike the point estimate which is a single value, the confidence interval contains a range of values. These values have lower and upper limits, known as confidence limits, and can be designated as L1 and L2, respectively.
A...

Interpretation of Confidence Intervals

Interpretation of Confidence Intervals

A confidence interval is a better estimate of the population than a point estimate, as it uses a range of values from a sample instead of a single value.
Confidence intervals have confidence coefficients that are crucial for their interpretation. The most common confidence coefficients are 0.90, 0.95, and 0.99, which can be written as percentages–90%, 95%, and 99%, respectively.
Suppose a person calculates a confidence interval with a confidence coefficient of 0.95. In that case, they can...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Prediction models for mortality in patients with acute on chronic liver failure: systematic review and critical appraisal.

Frontiers in medicine·2026

Same author

Type I unconventional protein secretion of the SARS-CoV-2 nucleocapsid protein promotes inflammatory cytokine release.

Cell reports·2026

Same author

Light Intensity-Driven Bidirectional Photoresponse Vision Sensor for Autonomous Obstacle Avoidance System.

Advanced materials (Deerfield Beach, Fla.)·2026

Same author

Valorization of the cauliflower mushroom (<i>Sparassis latifolia</i>) pseudosclerotia via ultrasound-assisted extraction of polysaccharides: Optimization, characterization and bioactivity evaluation.

Food chemistry: X·2026

Same author

Superior Benzene Catalytic Oxidation over Co<sub>3</sub>O<sub>4</sub> Catalysts with Oxygen Vacancy-Rich Co Sites.

Langmuir : the ACS journal of surfaces and colloids·2026

Same author

Regularized Tensor Quantile Regression With Applications to Neuroimaging Data Analysis.

Statistics in medicine·2026

Same journal

Instrumental Variable Estimation of Marginal Structural Mean Models for Time-Varying Treatment.

Journal of the American Statistical Association·2026

Same journal

Semiparametric Joint Modeling for Survival Analysis with Longitudinal Covariates.

Journal of the American Statistical Association·2026

Same journal

Dimension Reduction for Large-Scale Federated Data: Statistical Rate and Asymptotic Inference.

Journal of the American Statistical Association·2026

Same journal

Facilitating Heterogeneous Effect Estimation via Statistically Efficient Categorical Modifiers.

Journal of the American Statistical Association·2026

Same journal

Nonparametric Density Estimation of a Long-Term Trend from Repeated Semicontinuous Data.

Journal of the American Statistical Association·2026

Same journal

Functional Integrative Bayesian Analysis of High-dimensional Multiplatform Clinicogenomic Data.

Journal of the American Statistical Association·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Jun 10, 2025

An R-Based Landscape Validation of a Competing Risk Model

An R-Based Landscape Validation of a Competing Risk Model

Published on: September 16, 2022

Balancing Inferential Integrity and Disclosure Risk via Model Targeted Masking and Multiple Imputation.

Bei Jiang¹, Adrian E Raftery², Russell J Steele³

¹Department of Mathematical and Statistical Sciences, University of Alberta, Edmonton, AB T6G 2G1, Canada.

Journal of the American Statistical Association

|October 11, 2024

Summary

This summary is machine-generated.

A new data masking method using data augmentation and multiply imputed synthetic datasets (DA-MI) achieves 0% identity risk while preserving research data utility. This approach enhances data sharing for government-funded studies without compromising participant privacy.

Keywords:

Data augmentation Disclosure control Joint modeling Rare disease Synthetic data

More Related Videos

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index

Published on: January 8, 2020

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

Related Experiment Videos

Last Updated: Jun 10, 2025

An R-Based Landscape Validation of a Competing Risk Model

An R-Based Landscape Validation of a Competing Risk Model

Published on: September 16, 2022

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index

Published on: January 8, 2020

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

Area of Science:

Statistics
Data Privacy
Health Research

Background:

Open data sharing is crucial for research reproducibility but raises privacy concerns.
Multiply imputed (MI) synthetic datasets are used to protect identity, but can lead to information loss.
Existing methods may weaken or invalidate inferences from synthetic datasets.

Purpose of the Study:

To investigate a novel masking framework with data augmentation (DA) and a tuning mechanism.
To balance identity disclosure protection with data utility preservation.
To evaluate the effectiveness of the DA-MI strategy on a restricted-use Canadian Scleroderma Research Group (CSRG) dataset.

Main Methods:

Utilized a new masking framework incorporating data augmentation (DA) and multiply imputation (MI).
Employed a tuning mechanism to balance data utility and identity protection.
Applied the DA-MI strategy to analyze work-disability and interstitial lung disease outcomes within the CSRG dataset.

Main Results:

The DA-MI strategy achieved 0% identity disclosure risk.
All inferential conclusions were preserved.
High confidence interval (CI) overlap (98.5% and 95.5% on average, minimum 91%) was maintained compared to original data.
Conventional methods showed significantly lower CI overlap (73.9%-91.8%, minimum 28.1%).

Conclusions:

The DA-MI masking framework effectively protects participant identities.
This method facilitates the sharing of valuable research data.
DA-MI preserves data utility, ensuring reliable research conclusions.