Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Bias in Epidemiological Studies

Bias in Epidemiological Studies

Biases can arise at various stages of research, from study design and data collection to analysis and interpretation. Recognizing and addressing these biases is essential to ensure the validity and reliability of epidemiological findings.Broadly speaking, biases in epidemiology fall into three main categories: selection bias, information bias, and confounding. A more detailed description of possible biases is:

Study Design in Statistics

Study Design in Statistics

A study design is a set of techniques that allow a researcher to collect and analyze data from different variables defined for a specific research problem. Statistics is commonly for effective study design and more robust experiments,
Does aspirin reduce the risk of heart attacks? Is one brand of fertilizer more effective at growing roses than another? Is fatigue as dangerous to a driver as the influence of alcohol? Questions like these are answered using randomized experiments with proper...

Clinical Trials

Clinical Trials

Clinical trials are prospective experimental studies conducted on humans to determine the safety and efficacy of treatments, drugs, diet methods, and medical devices. Using statistics in clinical trials enables researchers to derive reasonable and accurate conclusions from the collected data, allowing them to make wise decisions in uncertain situations. In medical research, statistical methods are crucial for preventing errors and bias.
There are four phases in a clinical trial. A phase one...

Confounding in Epidemiological Studies

Confounding in Epidemiological Studies

Confounding in statistical epidemiology represents a pivotal challenge, referring to the distortion in the perceived relationship between an exposure and an outcome due to the presence of a third variable, known as a confounder. This variable is associated with both the exposure and the outcome but is not a direct link in their causal chain. Its presence can lead to erroneous interpretations of the exposure's effect, either exaggerating or underestimating the true association. This...

Statistical Software for Data Analysis and Clinical Trials

Statistical Software for Data Analysis and Clinical Trials

Statistical software is pivotal in data analysis and clinical trials by providing tools to analyze data, draw conclusions, and make predictions. These software packages range from simple data management applications to complex analytical platforms, supporting various statistical tests, models, and simulation techniques. Their significance lies in their ability to handle vast amounts of data with precision and efficiency, enabling researchers to validate hypotheses, identify trends, and make...

Statistical Methods for Analyzing Epidemiological Data

Statistical Methods for Analyzing Epidemiological Data

Epidemiological data primarily involves information on specific populations' occurrence, distribution, and determinants of health and diseases. This data is crucial for understanding disease patterns and impacts, aiding public health decision-making and disease prevention strategies. The analysis of epidemiological data employs various statistical methods to interpret health-related data effectively. Here are some commonly used methods:

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Digital health and consumer health informatics: past and future.

Medical research archives·2026

Same author

Evaluation Framework for Bruise Detection: Systematic ALS/White-Light Training and Skin-Tone Balancing with Deep Learning.

Sensors (Basel, Switzerland)·2026

Same author

Optimal insurance coverage and pricing of outpatient drugs in Iran: a cost- and chronicity-based adaptation of the vertical equity model.

International journal for equity in health·2026

Same author

The association of prenatal adiposity characteristics with early childhood overweight and obesity: findings from a large and diverse mother-child cohort.

International journal of obesity (2005)·2026

Same author

Variations of Dietary Intake Across Migraine Phases in Adults with Episodic Migraine: A Prospective Observational Pilot Study.

Current developments in nutrition·2026

Same author

An Interoperable Vaccine Record: A Roadmap to Realization.

Vaccines·2026

Same journal

Sensitivity Analyses of a Scoring System for a Contraception Decision Aid.

AMIA ... Annual Symposium proceedings. AMIA Symposium·2026

Same journal

Improving electronic health record processing of large language models via retrieval-augmented generation: A case study on dietary supplements.

AMIA ... Annual Symposium proceedings. AMIA Symposium·2026

Same journal

Developing a User-Centered Mobile Application Prototype: Bridging Lower-Limb Fracture Care from Skilled Nursing Facility and Back to the Community.

AMIA ... Annual Symposium proceedings. AMIA Symposium·2026

Same journal

KERAP: A Knowledge-Enhanced Reasoning Approach for Accurate Zero-shot Diagnosis Prediction Using Multi-agent LLMs.

AMIA ... Annual Symposium proceedings. AMIA Symposium·2026

Same journal

Automating Adjudication of Cardiovascular Events Using Large Language Models.

AMIA ... Annual Symposium proceedings. AMIA Symposium·2026

Same journal

Predictive Factors and State-Level Barriers to Postpartum Birth Control Usage in the United States: Insights from PRAMS Phase 8.

AMIA ... Annual Symposium proceedings. AMIA Symposium·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Jan 18, 2026

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index

Published on: January 8, 2020

Does Cohort Selection Affect Machine Learning from Clinical Data?

Atefehsadat Haghighathoseini¹, Janusz Wojtusiak¹, Hua Min¹

¹George Mason University, Fairfax, VA, USA.

AMIA ... Annual Symposium Proceedings. AMIA Symposium

|May 26, 2025

Summary

This summary is machine-generated.

Cohort selection significantly impacts machine learning (ML) model quality and fairness in clinical data analysis. Arbitrary data processing decisions can introduce bias, affecting patient outcome predictions, especially for diverse populations.

Keywords:

Data Processing Machine Learning National COVID Cohort Collaborative (N3C)Prediction Selection Bias

More Related Videos

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

Predicting Treatment Response to Image-Guided Therapies Using Machine Learning: An Example for Trans-Arterial Treatment of Hepatocellular Carcinoma

Predicting Treatment Response to Image-Guided Therapies Using Machine Learning: An Example for Trans-Arterial Treatment of Hepatocellular Carcinoma

Published on: October 10, 2018

Related Experiment Videos

Last Updated: Jan 18, 2026

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index

Published on: January 8, 2020

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

Predicting Treatment Response to Image-Guided Therapies Using Machine Learning: An Example for Trans-Arterial Treatment of Hepatocellular Carcinoma

Predicting Treatment Response to Image-Guided Therapies Using Machine Learning: An Example for Trans-Arterial Treatment of Hepatocellular Carcinoma

Published on: October 10, 2018

Area of Science:

Clinical Informatics
Machine Learning in Healthcare
Health Equity Research

Background:

Machine learning (ML) models are increasingly used for predicting patient outcomes.
Clinical data preprocessing involves critical decisions that can influence model performance.
The National COVID Cohort Collaborative (N3C) provides a large dataset for studying these effects.

Purpose of the Study:

To investigate the impact of cohort selection strategies on ML model quality and fairness.
To analyze how arbitrary data processing decisions affect model predictions.
To assess biases related to social determinants of health in ML models.

Main Methods:

Experiments conducted using the N3C dataset.
Generation of 16 distinct datasets by making four arbitrary cohort selection decisions.
Evaluation of dataset variations in size and properties.
Assessment of ML model performance across different cohorts.

Main Results:

Significant differences observed in dataset characteristics based on inclusion/exclusion criteria.
High potential for bias introduced by arbitrary cohort selection.
Substantial variations in ML model performance when trained on different cohorts.
Disparities in model performance highlighted when comparing cohorts with differing inclusion criteria.

Conclusions:

Cohort selection is a critical factor influencing ML model bias and fairness.
Transparent and justified data processing decisions are essential for reliable clinical ML.
Further research is needed to mitigate biases associated with social determinants of health in ML models.