Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Causality in Epidemiology01:21

Causality in Epidemiology

1.1K
Causality or causation is a fundamental concept in epidemiology, vital for understanding the relationships between various factors and health outcomes. Despite its importance, there's no single, universally accepted definition of causality within the discipline. Drawing from a systematic review, causality in epidemiology encompasses several definitions, including production, necessary and sufficient, sufficient-component, counterfactual, and probabilistic models. Each has its strengths and...
1.1K
Statistical Inference Techniques in Hypothesis Testing: Parametric Versus Nonparametric Data01:16

Statistical Inference Techniques in Hypothesis Testing: Parametric Versus Nonparametric Data

264
Statistical inference techniques, paramount in hypothesis testing, differentiate into two broad categories: parametric and nonparametric statistics.
Parametric statistics, as the name suggests, assumes that data follow a specific distribution, often a normal distribution. This assumption enables robust hypothesis testing and estimation. Parametric methods, like the Student's t-test or Goodness-of-fit test, are frequently employed in biostatistics due to their robustness. For instance,...
264
Study Design in Statistics01:15

Study Design in Statistics

9.6K
A study design is a set of techniques that allow a researcher to collect and analyze data from different variables defined for a specific research problem. Statistics is commonly for effective study design and more robust experiments,
Does aspirin reduce the risk of heart attacks? Is one brand of fertilizer more effective at growing roses than another? Is fatigue as dangerous to a driver as the influence of alcohol? Questions like these are answered using randomized experiments with proper...
9.6K
Statistical Methods for Analyzing Epidemiological Data01:25

Statistical Methods for Analyzing Epidemiological Data

642
Epidemiological data primarily involves information on specific populations' occurrence, distribution, and determinants of health and diseases. This data is crucial for understanding disease patterns and impacts, aiding public health decision-making and disease prevention strategies. The analysis of epidemiological data employs various statistical methods to interpret health-related data effectively. Here are some commonly used methods:
642
Regression Toward the Mean01:52

Regression Toward the Mean

6.6K
Regression toward the mean (“RTM”) is a phenomenon in which extremely high or low values—for example, and individual’s blood pressure at a particular moment—appear closer to a group’s average upon remeasuring. Although this statistical peculiarity is the result of random error and chance, it has been problematic across various medical, scientific, financial and psychological applications. In particular, RTM, if not taken into account, can interfere when...
6.6K
Statistical Significance01:50

Statistical Significance

20.6K
Once data is collected from both the experimental and the control groups, a statistical analysis is conducted to find out if there are meaningful differences between the two groups. A statistical analysis determines how likely any difference found is due to chance (and thus not meaningful). In psychology, group differences are considered meaningful, or significant, if the odds that these differences occurred by chance alone are 5 percent or less. Stated another way, if we repeated this...
20.6K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

A Causal Framework for Evaluating the Total Effect of Strategies Aiming to Expand Screening and to Improve Outcomes.

Statistics in medicine·2026
Same author

Perceived benefits of community-based TB preventive treatment in children in Uganda: "When she sees other children getting the same medication, she will feel not alone."

PLOS global public health·2026
Same author

Effect of Biomarker Confirmed Unhealthy Alcohol Use on Viral Suppression Among Adolescents and Young Adults With HIV in East Africa.

Journal of the International Association of Providers of AIDS Care·2026
Same author

Machine learning to optimize precision in the analysis of randomized trials: A journey in pre-specified, yet data-adaptive learning.

Clinical trials (London, England)·2026
Same author

Association between Levels of Alcohol Use, Perceived HIV Risk, and Untreated HIV Among Persons at Alcohol-Serving Venues in Kenya and Uganda.

AIDS and behavior·2026
Same author

Early experiences with usage of long-acting injectable cabotegravir among adults in rural Ugandan and Kenyan communities: qualitative research from the SEARCH "Dynamic Choice HIV Prevention" intervention trials.

Journal of the International AIDS Society·2025
Same journal

A SIMPLE AND POWERFUL TEST OF VACCINE WANING.

American journal of epidemiology·2026
Same journal

Association Between maternal body mass index, offspring growth and pubertal timing: results from a longitudinal birth cohort study.

American journal of epidemiology·2026
Same journal

Correction to: Developing a novel algorithm to identify incident and prevalent dementia in Medicare claims-the ARIC Study.

American journal of epidemiology·2026
Same journal

RE: advancing observational research on arts and health: theory-informed approaches using the RADIANCE framework.

American journal of epidemiology·2026
Same journal

Maternal Cesarean Section and Offspring ASD or ADHD Risk: A Nurses' Health Study II Analysis.

American journal of epidemiology·2026
Same journal

Immigration and epigenetic age acceleration in the health and retirement study: differences Between Hispanics and Non-Hispanics.

American journal of epidemiology·2026
See all related articles

Related Experiment Video

Updated: Oct 28, 2025

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment
12:18

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

7.7K

Demystifying Statistical Inference When Using Machine Learning in Causal Research.

Laura B Balzer1, Ted Westling2

  • 1Department of Biostatistics & Epidemiology, University of Massachusetts Amherst, Amherst, Massachusetts, United States.

American Journal of Epidemiology
|July 16, 2021
PubMed
Summary
This summary is machine-generated.

Researchers discuss using machine learning for causal inference in public health. They found targeted maximum likelihood estimation (TMLE) with specific Super Learner libraries can achieve valid statistical inference without sample-splitting.

Keywords:
Causal inferenceSuper LearnerTMLEcross-fittingcross-validationdouble robustmachine learningnon-parametric

More Related Videos

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index
06:55

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index

Published on: January 8, 2020

14.8K
Predicting Treatment Response to Image-Guided Therapies Using Machine Learning: An Example for Trans-Arterial Treatment of Hepatocellular Carcinoma
04:09

Predicting Treatment Response to Image-Guided Therapies Using Machine Learning: An Example for Trans-Arterial Treatment of Hepatocellular Carcinoma

Published on: October 10, 2018

8.4K

Related Experiment Videos

Last Updated: Oct 28, 2025

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment
12:18

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

7.7K
Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index
06:55

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index

Published on: January 8, 2020

14.8K
Predicting Treatment Response to Image-Guided Therapies Using Machine Learning: An Example for Trans-Arterial Treatment of Hepatocellular Carcinoma
04:09

Predicting Treatment Response to Image-Guided Therapies Using Machine Learning: An Example for Trans-Arterial Treatment of Hepatocellular Carcinoma

Published on: October 10, 2018

8.4K

Area of Science:

  • Epidemiology
  • Biostatistics
  • Machine Learning

Background:

  • Causal inference in public health increasingly uses machine learning (ML).
  • Valid statistical inference is crucial for reliable ML applications in research.
  • Methodological advancements are needed to address challenges in ML-based causal inference.

Purpose of the Study:

  • To comment on recommendations for valid statistical inference using ML in causal research.
  • To highlight the importance of the Super Learner library in ensemble methods.
  • To demonstrate alternative approaches for achieving valid inference.

Main Methods:

  • Review of prominent methodological work in ML for causal inference.
  • Simulation studies to evaluate targeted maximum likelihood estimation (TMLE) and Super Learner libraries.
  • Comparison of TMLE with and without sample-splitting.

Main Results:

  • Targeted maximum likelihood estimation (TMLE) can achieve low bias and valid statistical inference without sample-splitting.
  • A Super Learner library excluding tree-based methods but including regression splines was effective.
  • The necessity of extremely data-adaptive algorithms and sample-splitting is problem-dependent.

Conclusions:

  • The composition of the Super Learner library is critical for valid causal inference.
  • Sample-splitting may not always be necessary when using TMLE.
  • Further research is needed for practical guidance on selecting ML methods in epidemiology.