Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Cause and Effect

Cause and Effect

While variables are sometimes correlated because one does cause the other, it could also be that some other factor, a confounding variable, is actually causing the systematic movement in our variables of interest. For instance, as sales in ice cream increase, so does the overall rate of crime. Is it possible that indulging in your favorite flavor of ice cream could send you on a crime spree? Or, after committing crime do you think you might decide to treat yourself to a cone?

Causality in Epidemiology

Causality in Epidemiology

Causality or causation is a fundamental concept in epidemiology, vital for understanding the relationships between various factors and health outcomes. Despite its importance, there's no single, universally accepted definition of causality within the discipline. Drawing from a systematic review, causality in epidemiology encompasses several definitions, including production, necessary and sufficient, sufficient-component, counterfactual, and probabilistic models. Each has its strengths and...

Confounding in Epidemiological Studies

Confounding in Epidemiological Studies

Confounding in statistical epidemiology represents a pivotal challenge, referring to the distortion in the perceived relationship between an exposure and an outcome due to the presence of a third variable, known as a confounder. This variable is associated with both the exposure and the outcome but is not a direct link in their causal chain. Its presence can lead to erroneous interpretations of the exposure's effect, either exaggerating or underestimating the true association. This...

Regression Toward the Mean

Regression Toward the Mean

Regression toward the mean (“RTM”) is a phenomenon in which extremely high or low values—for example, and individual’s blood pressure at a particular moment—appear closer to a group’s average upon remeasuring. Although this statistical peculiarity is the result of random error and chance, it has been problematic across various medical, scientific, financial and psychological applications. In particular, RTM, if not taken into account, can interfere when...

Strategies for Assessing and Addressing Confounding

Strategies for Assessing and Addressing Confounding

Confounding is a critical issue in epidemiological studies, often leading to misleading conclusions about associations between exposures and outcomes. It occurs when the relationship between the exposure and the outcome is mixed with the effects of other factors that influence the outcome. Given that, addressing confounding is of high importance for drawing accurate inferences in research.
Confounding can be addressed at both the design phase of a study and through analytical methods after data...

Bias in Epidemiological Studies

Bias in Epidemiological Studies

Biases can arise at various stages of research, from study design and data collection to analysis and interpretation. Recognizing and addressing these biases is essential to ensure the validity and reliability of epidemiological findings.Broadly speaking, biases in epidemiology fall into three main categories: selection bias, information bias, and confounding. A more detailed description of possible biases is:

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

From single conventional regression to ensemble modelling: relative importance of the Healthy Eating Index-2015 components in relation to adverse pregnancy outcomes.

The British journal of nutrition·2026

Same author

Causal K-Means Clustering.

Journal of the Royal Statistical Society. Series B, Statistical methodology·2026

Same author

Evaluating Methods for High-Dimensional Mediation in Metabolomics Data.

Environmental science & technology·2026

Same author

Environmental Chemicals as Modifiers of the Association between Age and Ovarian Reserve.

medRxiv : the preprint server for health sciences·2025

Same author

Disordered Eating Behaviors During Adolescence and Risk of Polycystic Ovary Syndrome: A Prospective Cohort Study.

The Journal of clinical endocrinology and metabolism·2025

Same author

Covariate-assisted bounds on causal effects with instrumental variables.

Journal of the Royal Statistical Society. Series B, Statistical methodology·2025

Same journal

A SIMPLE AND POWERFUL TEST OF VACCINE WANING.

American journal of epidemiology·2026

Same journal

Association Between maternal body mass index, offspring growth and pubertal timing: results from a longitudinal birth cohort study.

American journal of epidemiology·2026

Same journal

Correction to: Developing a novel algorithm to identify incident and prevalent dementia in Medicare claims-the ARIC Study.

American journal of epidemiology·2026

Same journal

RE: advancing observational research on arts and health: theory-informed approaches using the RADIANCE framework.

American journal of epidemiology·2026

Same journal

Maternal Cesarean Section and Offspring ASD or ADHD Risk: A Nurses' Health Study II Analysis.

American journal of epidemiology·2026

Same journal

Immigration and epigenetic age acceleration in the health and retirement study: differences Between Hispanics and Non-Hispanics.

American journal of epidemiology·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Oct 28, 2025

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

Challenges in Obtaining Valid Causal Effect Estimates with Machine Learning Algorithms.

Ashley I Naimi¹, Alan E Mishler², Edward H Kennedy²

¹Department of Epidemiology, Emory University.

American Journal of Epidemiology

|July 16, 2021

Summary

This summary is machine-generated.

Machine learning (ML) methods for causal effect estimation can be unreliable. Double-robust estimators combined with advanced ML techniques are crucial for accurate results, while singly robust ML methods should be avoided.

Keywords:

causal inference doubly-robust estimation epidemiologic methods machine learning nonparametric methods semiparametric theory

More Related Videos

Machine Learning Algorithms for Early Detection of Bone Metastases in an Experimental Rat Model

Machine Learning Algorithms for Early Detection of Bone Metastases in an Experimental Rat Model

Published on: August 16, 2020

Constructing and Visualizing Models using Mime-based Machine-learning Framework

Constructing and Visualizing Models using Mime-based Machine-learning Framework

Published on: July 22, 2025

Related Experiment Videos

Last Updated: Oct 28, 2025

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

Machine Learning Algorithms for Early Detection of Bone Metastases in an Experimental Rat Model

Machine Learning Algorithms for Early Detection of Bone Metastases in an Experimental Rat Model

Published on: August 16, 2020

Constructing and Visualizing Models using Mime-based Machine-learning Framework

Constructing and Visualizing Models using Mime-based Machine-learning Framework

Published on: July 22, 2025

Area of Science:

Causal inference
Statistical modeling
Machine learning applications

Background:

Machine learning (ML) methods are increasingly proposed for causal effect estimation due to their flexibility.
However, ML algorithms can sometimes underperform compared to traditional parametric regression.
The performance of ML-based estimators, particularly single- and double-robust types, requires thorough investigation.

Purpose of the Study:

To evaluate the performance of ML-based single- and double-robust estimators for causal effect estimation.
To compare these methods against parametric regression under varying confounding scenarios.
To identify conditions and techniques that improve the reliability of ML-based causal inference.

Main Methods:

Conducted 100 Monte Carlo simulations with sample sizes of 200, 1200, and 5000.
Investigated bias and confidence interval coverage in simple and complex confounding scenarios.
Assessed ML algorithms within single- and double-robust estimation frameworks, including techniques like sample splitting and confounder interactions.

Main Results:

In simple confounding, double-robust ML estimators outperformed single-robust ones.
In complex nonlinear confounding, single-robust ML estimators showed significant bias, similar to misspecified parametric models.
While double-robust estimators were less biased in complex scenarios, coverage remained suboptimal unless sample splitting, interactions, and rich ML models were used.

Conclusions:

ML-based singly robust methods for causal inference are not recommended due to potential bias.
A combination of doubly robust estimation, sample splitting, confounder interactions, and richly specified ML algorithms is essential for reliable causal effect estimation.
Careful implementation is necessary to harness the benefits of ML in causal inference.