Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Experiment Videos

Bias in error estimation when using cross-validation for model selection.

Sudhir Varma1, Richard Simon

  • 1Biometric Research Branch, National Cancer Institute, Bethesda, MD, USA. varmas@mail.nih.gov

BMC Bioinformatics
|March 1, 2006
PubMed
Summary
This summary is machine-generated.

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Evaluation of periodontal ligament stem cell viability and proliferation on native collagen membranes: An <i>in vitro</i> study.

Journal of Indian Society of Periodontology·2026
Same author

2D Ultrasound Elasticity Imaging of Abdominal Aortic Aneurysms Using Deep Neural Networks.

IEEE transactions on computational imaging·2026
Same author

Toward Patient-Specific Partial Point Cloud to Surface Completion for Pre to Intra-operative Registration in Image-Guided Liver Interventions.

Medical Image Understanding and Analysis. Medical Image Understanding and Analysis (Conference)·2026
Same author

Evaluation of Intra-operative Patient-specific Methods for Point Cloud Completion for Minimally Invasive Liver Interventions.

Proceedings of SPIE--the International Society for Optical Engineering·2026
Same author

Investigating the Domain Adaptability of General-Purpose Foundation Models for Left Atrium Segmentation from MR Images.

Functional imaging and modeling of the heart : ... International Workshop, FIMH ..., proceedings. FIMH (Conference)·2026
Same author

Assessing the Performance of the DINOv2 Self-supervised Learning Vision Transformer Model for the Segmentation of the Left Atrium from MRI Images.

Proceedings of SPIE--the International Society for Optical Engineering·2026
Same journal

OpenIMC: an open-source platform for analyzing single-cell and spatial proteomics by imaging mass cytometry.

BMC bioinformatics·2026
Same journal

NAP: an open source pipeline for cross-domain microbiome profiling using Nanopore sequencing-derived amplicon data.

BMC bioinformatics·2026
Same journal

SurvGME: an R package for survival analysis with graphical and measurement error models.

BMC bioinformatics·2026
Same journal

SimMapNet: a Bayesian framework for gene regulatory network inference using gene ontology similarities as external hint.

BMC bioinformatics·2026
Same journal

Dual channel drug-drug interactions extraction based on cross attention.

BMC bioinformatics·2026
Same journal

FeSseqdb: a curated sequence-level database and interpretable machine learning framework for identifying iron-sulfur proteins.

BMC bioinformatics·2026
See all related articles

Estimating classifier error using cross-validation (CV) after parameter optimization leads to biased results. A nested CV procedure provides a more accurate error estimate for classifiers like Shrunken Centroids and Support Vector Machines.

Area of Science:

  • Machine Learning
  • Bioinformatics
  • Statistical Modeling

Background:

  • Cross-validation (CV) is a standard technique for estimating classifier prediction error.
  • Recent methods optimize classifiers by selecting parameters that minimize CV error estimates.
  • The validity of using these optimized CV error estimates for independent data prediction is questioned.

Purpose of the Study:

  • To evaluate the accuracy of CV error estimates for classifiers optimized using CV.
  • To assess the bias in prediction error estimation after classifier parameter tuning.
  • To compare standard CV with nested CV for error estimation.

Main Methods:

  • Optimized classification parameters for Shrunken Centroids and Support Vector Machines (SVM) using CV on simulated 'null' datasets.

Related Experiment Videos

  • Employed 10-fold CV for Shrunken Centroids and Leave-One-Out CV (LOOCV) for SVM.
  • Implemented a nested CV procedure with an inner loop for parameter tuning and an outer loop for error estimation.
  • Main Results:

    • CV error estimates for optimized classifiers were substantially biased, underestimating true error.
    • Optimized Shrunken Centroids and SVM classifiers showed inflated performance on 'null' data, with error rates below 30% on 18.5% and 38% of datasets, respectively.
    • Performance on independent test sets was no better than chance for optimized classifiers.
    • Nested CV significantly reduced bias, yielding error estimates close to those from independent test sets.

    Conclusions:

    • Using CV for error estimation after CV-based parameter tuning results in significantly biased error estimates.
    • Accurate error estimation requires including all algorithmic steps, including parameter tuning, within each CV loop.
    • Nested CV provides a nearly unbiased estimate of a classifier's true error rate.