Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Search research articles

Related Experiment Videos

Prediction error estimation: a comparison of resampling methods.

Annette M Molinaro¹, Richard Simon, Ruth M Pfeiffer

¹Biostatistics Branch, Division of Cancer Epidemiology and Genetics, NCI, NIH, Rockville, MD 20852, USA. annette.molinaro@yale.edu

Bioinformatics (Oxford, England)

|May 21, 2005

Summary

This summary is machine-generated.

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Spatially identifying regions of tumor recurrence in patients with suspected recurrent glioma using physiologic MRI and machine learning.

NPJ digital medicine·2026

Same author

2D Ultrasound Elasticity Imaging of Abdominal Aortic Aneurysms Using Deep Neural Networks.

IEEE transactions on computational imaging·2026

Same author

MRI Deep Learning for Differentiating Glioblastoma, IDH Wild-type from Central Nervous System Diffuse Large B-cell Lymphoma.

Cancer research communications·2026

Same author

Toward Patient-Specific Partial Point Cloud to Surface Completion for Pre to Intra-operative Registration in Image-Guided Liver Interventions.

Medical Image Understanding and Analysis. Medical Image Understanding and Analysis (Conference)·2026

Same author

Evaluation of Intra-operative Patient-specific Methods for Point Cloud Completion for Minimally Invasive Liver Interventions.

Proceedings of SPIE--the International Society for Optical Engineering·2026

Same author

Investigating the Domain Adaptability of General-Purpose Foundation Models for Left Atrium Segmentation from MR Images.

Functional imaging and modeling of the heart : ... International Workshop, FIMH ..., proceedings. FIMH (Conference)·2026

Same journal

conMItion: an R package adjusting confounding factors for associations in multi-omics.

Bioinformatics (Oxford, England)·2026

Same journal

SpaMFG: a Spatial Multi-omics Integration Method based on Feature Grouping.

Bioinformatics (Oxford, England)·2026

Same journal

CSCN: Inference of Cell-Specific Causal Networks Using Single-Cell RNA-Seq Data.

Bioinformatics (Oxford, England)·2026

Same journal

Sparse CCA-Based Mediation Analysis with High-Dimensional Exposures and Mediators.

Bioinformatics (Oxford, England)·2026

Same journal

Enhancing Cross-Context Generalization in Drug Perturbation Prediction with a Multimodal Conditional Diffusion Framework.

Bioinformatics (Oxford, England)·2026

Same journal

Primer Design through Submodular Function Estimation.

Bioinformatics (Oxford, England)·2026

See all related articles

Estimating prediction error in genomic studies with feature selection is challenging. Leave-one-out cross-validation (LOOCV) and 10-fold cross-validation (CV) offer the least biased estimates for small sample sizes.

Area of Science:

Genomics
Biostatistics
Machine Learning

Background:

Genomic studies generate thousands of features from limited samples.
Classifiers are built to predict outcomes from these features.
Accurate prediction error estimation is crucial, especially with feature selection.

Purpose of the Study:

To compare methods for estimating prediction error in genomic studies.
To evaluate the bias of different resampling techniques in the presence of feature selection.
To identify optimal methods for prediction assessment in small sample sizes.

Main Methods:

Comparison of prediction error estimation methods.
Evaluation of resubstitution, split-sample, leave-one-out cross-validation (LOOCV), k-fold cross-validation (CV), and .632+ bootstrap.

Related Experiment Videos

Analysis of bias and mean square error across different resampling techniques.

Main Results:

Resubstitution and simple split-sample estimates are biased in small genomic studies.
LOOCV, 10-fold CV, and .632+ bootstrap show the smallest bias for certain models.
LOOCV, 5- and 10-fold CV, and .632+ bootstrap yield the lowest mean square error.
The .632+ bootstrap is biased in small samples with high signal-to-noise ratios.
Method performance differences decrease with increasing sample size.

Conclusions:

LOOCV and k-fold CV are recommended for accurate prediction error estimation in small genomic studies.
The choice of method impacts the reliability of prediction error estimates.
Increasing sample size reduces the differences in performance among resampling methods.