Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Experiment Videos

Stratification bias in low signal microarray studies.

Brian J Parker1, Simon Günter, Justin Bedo

  • 1Statistical Machine Learning Group, NICTA, Canberra, Australia. brian.bj.parker@gmail.com

BMC Bioinformatics
|September 4, 2007
PubMed
Summary
This summary is machine-generated.

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Olaparib, durvalumab, and cyclophosphamide, and a prognostic blood signature in platinum-sensitive ovarian cancer: the randomized phase 2 SOLACE2 trial.

Nature communications·2025
Same author

Cost of care associated with utilization of telehealth in clinical trials.

Gynecologic oncology reports·2024
Same author

Oligodendrocyte Slc48a1 (Hrg1) encodes a functional heme transporter required for myelin integrity.

Glia·2024
Same author

BRCA1 secondary splice-site mutations drive exon-skipping and PARP inhibitor resistance.

Molecular cancer·2024
Same author

Lack of affinity signature for germinal center cells that have initiated plasma cell differentiation.

Immunity·2024
Same author

Integrating deep mutational scanning and low-throughput mutagenesis data to predict the impact of amino acid variants.

GigaScience·2023
Same journal

OpenIMC: an open-source platform for analyzing single-cell and spatial proteomics by imaging mass cytometry.

BMC bioinformatics·2026
Same journal

NAP: an open source pipeline for cross-domain microbiome profiling using Nanopore sequencing-derived amplicon data.

BMC bioinformatics·2026
Same journal

SurvGME: an R package for survival analysis with graphical and measurement error models.

BMC bioinformatics·2026
Same journal

SimMapNet: a Bayesian framework for gene regulatory network inference using gene ontology similarities as external hint.

BMC bioinformatics·2026
Same journal

Dual channel drug-drug interactions extraction based on cross attention.

BMC bioinformatics·2026
Same journal

FeSseqdb: a curated sequence-level database and interpretable machine learning framework for identifying iron-sulfur proteins.

BMC bioinformatics·2026
See all related articles

Stratification bias negatively impacts performance measures in small biological datasets. Using balanced, stratified cross-validation methods avoids this bias for accurate model evaluation.

Area of Science:

  • Bioinformatics
  • Machine Learning
  • Statistical Analysis

Background:

  • Small sample size biological datasets, like microarrays, require careful analysis to prevent biases.
  • Stratification bias, arising from imperfect sample distribution in training/test sets, can distort results.
  • This bias is exacerbated by negative correlations in class proportion variations between sets.

Purpose of the Study:

  • To analyze stratification bias in small sample size biological datasets.
  • To evaluate its impact on common performance measures for classifier evaluation.
  • To identify and recommend bias-avoiding validation techniques.

Main Methods:

  • Simulations were conducted to quantify bias in performance measures.
  • Analysis included commonly used metrics like error rate and Area Under the ROC Curve (AUC).

Related Experiment Videos

  • Validation techniques examined included k-fold cross-validation and leave-one-out cross-validation.
  • Main Results:

    • Common performance measures, especially AUC, exhibit substantial negative bias on low-signal datasets.
    • AUC can be severely underestimated, with random datasets showing values below 0.5.
    • Biases were confirmed through simulations and analysis of the van 't Veer breast cancer dataset.

    Conclusions:

    • Stratification bias significantly affects performance metrics, particularly AUC.
    • Averaging per-fold AUC estimates is recommended over pooling test samples.
    • Balanced, stratified cross-validation methods effectively eliminate bias and are recommended for small dataset evaluation.