Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Bias in Epidemiological Studies01:29

Bias in Epidemiological Studies

1.3K
Biases can arise at various stages of research, from study design and data collection to analysis and interpretation. Recognizing and addressing these biases is essential to ensure the validity and reliability of epidemiological findings.Broadly speaking, biases in epidemiology fall into three main categories: selection bias, information bias, and confounding. A more detailed description of possible biases is:  
1.3K
Study Design in Statistics01:15

Study Design in Statistics

9.9K
A study design is a set of techniques that allow a researcher to collect and analyze data from different variables defined for a specific research problem. Statistics is commonly for effective study design and more robust experiments,
Does aspirin reduce the risk of heart attacks? Is one brand of fertilizer more effective at growing roses than another? Is fatigue as dangerous to a driver as the influence of alcohol? Questions like these are answered using randomized experiments with proper...
9.9K
Clinical Trials01:16

Clinical Trials

10.2K
Clinical trials are prospective experimental studies conducted on humans to determine the safety and efficacy of treatments, drugs, diet methods, and medical devices. Using statistics in clinical trials enables researchers to derive reasonable and accurate conclusions from the collected data, allowing them to make wise decisions in uncertain situations. In medical research, statistical methods are crucial for preventing errors and bias.
There are four phases in a clinical trial. A phase one...
10.2K
Confounding in Epidemiological Studies01:27

Confounding in Epidemiological Studies

586
Confounding in statistical epidemiology represents a pivotal challenge, referring to the distortion in the perceived relationship between an exposure and an outcome due to the presence of a third variable, known as a confounder. This variable is associated with both the exposure and the outcome but is not a direct link in their causal chain. Its presence can lead to erroneous interpretations of the exposure's effect, either exaggerating or underestimating the true association. This...
586
Statistical Software for Data Analysis and Clinical Trials01:12

Statistical Software for Data Analysis and Clinical Trials

1.4K
Statistical software is pivotal in data analysis and clinical trials by providing tools to analyze data, draw conclusions, and make predictions. These software packages range from simple data management applications to complex analytical platforms, supporting various statistical tests, models, and simulation techniques. Their significance lies in their ability to handle vast amounts of data with precision and efficiency, enabling researchers to validate hypotheses, identify trends, and make...
1.4K
Statistical Methods for Analyzing Epidemiological Data01:25

Statistical Methods for Analyzing Epidemiological Data

900
Epidemiological data primarily involves information on specific populations' occurrence, distribution, and determinants of health and diseases. This data is crucial for understanding disease patterns and impacts, aiding public health decision-making and disease prevention strategies. The analysis of epidemiological data employs various statistical methods to interpret health-related data effectively. Here are some commonly used methods:
900

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Digital health and consumer health informatics: past and future.

Medical research archives·2026
Same author

Evaluation Framework for Bruise Detection: Systematic ALS/White-Light Training and Skin-Tone Balancing with Deep Learning.

Sensors (Basel, Switzerland)·2026
Same author

Optimal insurance coverage and pricing of outpatient drugs in Iran: a cost- and chronicity-based adaptation of the vertical equity model.

International journal for equity in health·2026
Same author

The association of prenatal adiposity characteristics with early childhood overweight and obesity: findings from a large and diverse mother-child cohort.

International journal of obesity (2005)·2026
Same author

Variations of Dietary Intake Across Migraine Phases in Adults with Episodic Migraine: A Prospective Observational Pilot Study.

Current developments in nutrition·2026
Same author

An Interoperable Vaccine Record: A Roadmap to Realization.

Vaccines·2026
Same journal

Sensitivity Analyses of a Scoring System for a Contraception Decision Aid.

AMIA ... Annual Symposium proceedings. AMIA Symposium·2026
Same journal

Improving electronic health record processing of large language models via retrieval-augmented generation: A case study on dietary supplements.

AMIA ... Annual Symposium proceedings. AMIA Symposium·2026
Same journal

Developing a User-Centered Mobile Application Prototype: Bridging Lower-Limb Fracture Care from Skilled Nursing Facility and Back to the Community.

AMIA ... Annual Symposium proceedings. AMIA Symposium·2026
Same journal

KERAP: A Knowledge-Enhanced Reasoning Approach for Accurate Zero-shot Diagnosis Prediction Using Multi-agent LLMs.

AMIA ... Annual Symposium proceedings. AMIA Symposium·2026
Same journal

Automating Adjudication of Cardiovascular Events Using Large Language Models.

AMIA ... Annual Symposium proceedings. AMIA Symposium·2026
Same journal

Predictive Factors and State-Level Barriers to Postpartum Birth Control Usage in the United States: Insights from PRAMS Phase 8.

AMIA ... Annual Symposium proceedings. AMIA Symposium·2026
See all related articles

Related Experiment Video

Updated: Jan 18, 2026

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index
06:55

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index

Published on: January 8, 2020

15.1K

Does Cohort Selection Affect Machine Learning from Clinical Data?

Atefehsadat Haghighathoseini1, Janusz Wojtusiak1, Hua Min1

  • 1George Mason University, Fairfax, VA, USA.

AMIA ... Annual Symposium Proceedings. AMIA Symposium
|May 26, 2025
PubMed
Summary
This summary is machine-generated.

Cohort selection significantly impacts machine learning (ML) model quality and fairness in clinical data analysis. Arbitrary data processing decisions can introduce bias, affecting patient outcome predictions, especially for diverse populations.

Keywords:
Data ProcessingMachine LearningNational COVID Cohort Collaborative (N3C)PredictionSelection Bias

More Related Videos

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment
12:18

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

7.9K
Predicting Treatment Response to Image-Guided Therapies Using Machine Learning: An Example for Trans-Arterial Treatment of Hepatocellular Carcinoma
04:09

Predicting Treatment Response to Image-Guided Therapies Using Machine Learning: An Example for Trans-Arterial Treatment of Hepatocellular Carcinoma

Published on: October 10, 2018

8.6K

Related Experiment Videos

Last Updated: Jan 18, 2026

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index
06:55

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index

Published on: January 8, 2020

15.1K
A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment
12:18

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

7.9K
Predicting Treatment Response to Image-Guided Therapies Using Machine Learning: An Example for Trans-Arterial Treatment of Hepatocellular Carcinoma
04:09

Predicting Treatment Response to Image-Guided Therapies Using Machine Learning: An Example for Trans-Arterial Treatment of Hepatocellular Carcinoma

Published on: October 10, 2018

8.6K

Area of Science:

  • Clinical Informatics
  • Machine Learning in Healthcare
  • Health Equity Research

Background:

  • Machine learning (ML) models are increasingly used for predicting patient outcomes.
  • Clinical data preprocessing involves critical decisions that can influence model performance.
  • The National COVID Cohort Collaborative (N3C) provides a large dataset for studying these effects.

Purpose of the Study:

  • To investigate the impact of cohort selection strategies on ML model quality and fairness.
  • To analyze how arbitrary data processing decisions affect model predictions.
  • To assess biases related to social determinants of health in ML models.

Main Methods:

  • Experiments conducted using the N3C dataset.
  • Generation of 16 distinct datasets by making four arbitrary cohort selection decisions.
  • Evaluation of dataset variations in size and properties.
  • Assessment of ML model performance across different cohorts.

Main Results:

  • Significant differences observed in dataset characteristics based on inclusion/exclusion criteria.
  • High potential for bias introduced by arbitrary cohort selection.
  • Substantial variations in ML model performance when trained on different cohorts.
  • Disparities in model performance highlighted when comparing cohorts with differing inclusion criteria.

Conclusions:

  • Cohort selection is a critical factor influencing ML model bias and fairness.
  • Transparent and justified data processing decisions are essential for reliable clinical ML.
  • Further research is needed to mitigate biases associated with social determinants of health in ML models.