Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Bias in Epidemiological Studies01:29

Bias in Epidemiological Studies

1.2K
Biases can arise at various stages of research, from study design and data collection to analysis and interpretation. Recognizing and addressing these biases is essential to ensure the validity and reliability of epidemiological findings.Broadly speaking, biases in epidemiology fall into three main categories: selection bias, information bias, and confounding. A more detailed description of possible biases is:  
1.2K
Comparing the Survival Analysis of Two or More Groups01:20

Comparing the Survival Analysis of Two or More Groups

538
Survival analysis is a cornerstone of medical research, used to evaluate the time until an event of interest occurs, such as death, disease recurrence, or recovery. Unlike standard statistical methods, survival analysis is particularly adept at handling censored data—instances where the event has not occurred for some participants by the end of the study or remains unobserved. To address these unique challenges, specialized techniques like the Kaplan-Meier estimator, log-rank test, and...
538
Confounding in Epidemiological Studies01:27

Confounding in Epidemiological Studies

555
Confounding in statistical epidemiology represents a pivotal challenge, referring to the distortion in the perceived relationship between an exposure and an outcome due to the presence of a third variable, known as a confounder. This variable is associated with both the exposure and the outcome but is not a direct link in their causal chain. Its presence can lead to erroneous interpretations of the exposure's effect, either exaggerating or underestimating the true association. This...
555
Statistical Methods for Analyzing Epidemiological Data01:25

Statistical Methods for Analyzing Epidemiological Data

875
Epidemiological data primarily involves information on specific populations' occurrence, distribution, and determinants of health and diseases. This data is crucial for understanding disease patterns and impacts, aiding public health decision-making and disease prevention strategies. The analysis of epidemiological data employs various statistical methods to interpret health-related data effectively. Here are some commonly used methods:
875
Cancer Survival Analysis01:21

Cancer Survival Analysis

630
Cancer survival analysis focuses on quantifying and interpreting the time from a key starting point, such as diagnosis or the initiation of treatment, to a specific endpoint, such as remission or death. This analysis provides critical insights into treatment effectiveness and factors that influence patient outcomes, helping to shape clinical decisions and guide prognostic evaluations. A cornerstone of oncology research, survival analysis tackles the challenges of skewed, non-normally...
630
Estimating Population Mean with Unknown Standard Deviation01:22

Estimating Population Mean with Unknown Standard Deviation

8.7K
In practice, we rarely know the population standard deviation. In the past, when the sample size was large, this did not present a problem to statisticians. They used the sample standard deviation s as an estimate for σ and proceeded as before to calculate a confidence interval with close enough results. However, statisticians ran into problems when the sample size was small. A small sample size caused inaccuracies in the confidence interval.
William S. Gosset (1876–1937) of the...
8.7K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Development and validation of a multimodal artificial intelligence-based model for predicting post-prostatectomy treatment outcomes from baseline biparametric prostate magnetic resonance imaging.

Diagnostic and interventional radiology (Ankara, Turkey)·2026
Same author

The impact of recurrent urinary tract infections in women: a qualitative study.

The British journal of general practice : the journal of the Royal College of General Practitioners·2026
Same author

Economic Evaluation of Oral Nirmatrelvir-Ritonavir for COVID-19 in Higher Risk Outpatients.

JAMA network open·2026
Same author

Mitigating algorithmic unfairness arising from forgetfulness of medical records in clinical artificial intelligence.

Nature communications·2026
Same author

Development and Validation of a Multimodal AI-Based Model for Predicting Post-Prostatectomy Treatment Outcomes from Baseline Biparametric Prostate MRI.

medRxiv : the preprint server for health sciences·2026
Same author

Nirmatrelvir-ritonavir versus usual care in at-risk adults with early SARS-CoV-2 infection in the UK, 2022-23: virological and immunological results of an open-label randomised trial (PANORAMIC).

The Lancet. Microbe·2026
Same journal

Analysis of End-Tidal CO2 Variability During Plateau Waves Episodes: An Information Theoretic Approach<sup></sup>.

Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual International Conference·2025
Same journal

AI and Tomosynthesis for Breast Cancer Molecular Subtyping: A step toward precision medicine<sup></sup>.

Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual International Conference·2025
Same journal

Towards Sustainable Protein Recovery from Biological Waste: Assessing Polyethersulfone-based Microfiltration.

Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual International Conference·2025
Same journal

Analysis of the cardiovascular response to standardized polymicrobial peritonitis experimental model.

Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual International Conference·2025
Same journal

Automated Wrist Ultrasound Image Bone Enhancement and Segmentation Using Deep Learning.

Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual International Conference·2025
Same journal

A Deep Learning approach for Depressive Symptoms assessment in Parkinson's disease patients using facial videos.

Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual International Conference·2025
See all related articles

Related Experiment Video

Updated: Jan 9, 2026

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index
06:55

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index

Published on: January 8, 2020

15.0K

Navigating Severe Class Imbalance in Population Cohort Data.

Joshua Fieggen, Bradley Segal, Emma C Walker

    Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual International Conference
    |December 3, 2025
    PubMed
    Summary
    This summary is machine-generated.

    Predicting rare diseases like multiple myeloma with imbalanced data is challenging. Anomaly detection models outperformed traditional methods, but no single approach is universally best; clinical application guides model choice.

    More Related Videos

    Establishing a Competing Risk Regression Nomogram Model for Survival Data
    04:57

    Establishing a Competing Risk Regression Nomogram Model for Survival Data

    Published on: October 23, 2020

    10.7K
    An R-Based Landscape Validation of a Competing Risk Model
    05:37

    An R-Based Landscape Validation of a Competing Risk Model

    Published on: September 16, 2022

    2.5K

    Related Experiment Videos

    Last Updated: Jan 9, 2026

    Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index
    06:55

    Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index

    Published on: January 8, 2020

    15.0K
    Establishing a Competing Risk Regression Nomogram Model for Survival Data
    04:57

    Establishing a Competing Risk Regression Nomogram Model for Survival Data

    Published on: October 23, 2020

    10.7K
    An R-Based Landscape Validation of a Competing Risk Model
    05:37

    An R-Based Landscape Validation of a Competing Risk Model

    Published on: September 16, 2022

    2.5K

    Area of Science:

    • Machine Learning
    • Medical Informatics
    • Genomics

    Background:

    • Class imbalance poses significant challenges for predictive modeling of rare disease outcomes in large population studies.
    • Traditional machine learning algorithms often exhibit biased performance and poor generalizability on imbalanced datasets.

    Purpose of the Study:

    • To systematically evaluate various methods for mitigating class imbalance in predicting multiple myeloma using proteomic and clinical data.
    • To compare standard classifiers, resampling techniques, anomaly detection, and a foundation model for rare disease prediction.

    Main Methods:

    • Compared XGBoost, logistic regression, SMOTE, isolation forests, local outlier factors, one-class SVM, autoencoders, and TabPFN.
    • Introduced a sequential XGBoost ensemble (SeqXGB) to prioritize precision.
    • Evaluated models using standard classification performance metrics on UK Biobank data.

    Main Results:

    • Anomaly detection models demonstrated superior generalization compared to conventional classifiers (XGBoost, logistic regression).
    • SMOTE did not improve, and potentially worsened, predictive performance.
    • SeqXGB significantly reduced false positives but also substantially decreased sensitivity.

    Conclusions:

    • No single method universally addresses class imbalance; model selection must align with clinical application and the trade-off between false positives and negatives.
    • Findings emphasize the need to consider the specific clinical purpose (e.g., screening vs. diagnosis) when evaluating machine learning models for rare diseases.