Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Mechanistic Models: Compartment Models in Algorithms for Numerical Problem Solving

Mechanistic Models: Compartment Models in Algorithms for Numerical Problem Solving

Mechanistic models play a crucial role in algorithms for numerical problem-solving, particularly in nonlinear mixed effects modeling (NMEM). These models aim to minimize specific objective functions by evaluating various parameter estimates, leading to the development of systematic algorithms. In some cases, linearization techniques approximate the model using linear equations.
In individual population analyses, different algorithms are employed, such as Cauchy's method, which uses a...

Types of Errors: Detection and Minimization

Types of Errors: Detection and Minimization

Error is the deviation of the obtained result from the true, expected value or the estimated central value. Errors are expressed in absolute or relative terms.
Absolute error in a measurement is the numerical difference from the true or central value. Relative error is the ratio between absolute error and the true or central value, expressed as a percentage.
Errors can be classified by source, magnitude, and sign. There are three types of errors: systematic, random, and gross.
Systematic or...

Regression Toward the Mean

Regression Toward the Mean

Regression toward the mean (“RTM”) is a phenomenon in which extremely high or low values—for example, and individual’s blood pressure at a particular moment—appear closer to a group’s average upon remeasuring. Although this statistical peculiarity is the result of random error and chance, it has been problematic across various medical, scientific, financial and psychological applications. In particular, RTM, if not taken into account, can interfere when...

Calibration Curves: Linear Least Squares

Calibration Curves: Linear Least Squares

A calibration curve is a plot of the instrument's response against a series of known concentrations of a substance. This curve is used to set the instrument response levels, using the substance and its concentrations as standards. Alternatively, or additionally, an equation is fitted to the calibration curve plot and subsequently used to calculate the unknown concentrations of other samples reliably.
For data that follow a straight line, the standard method for fitting is the linear...

Residuals and Least-Squares Property

Residuals and Least-Squares Property

The vertical distance between the actual value of y and the estimated value of y. In other words, it measures the vertical distance between the actual data point and the predicted point on the line
If the observed data point lies above the line, the residual is positive, and the line underestimates the actual data value for y. If the observed data point lies below the line, the residual is negative, and the line overestimates the actual data value for y.
The process of fitting the best-fit...

Accuracy and Errors in Hypothesis Testing

Accuracy and Errors in Hypothesis Testing

Hypothesis testing is a fundamental statistical tool that begins with the assumption that the null hypothesis H0 is true. During this process, two types of errors can occur: Type I and Type II. A Type I error refers to the incorrect rejection of a true null hypothesis, while a Type II error involves the failure to reject a false null hypothesis.
In hypothesis testing, the probability of making a Type I error, denoted as α, is commonly set at 0.05. This significance level indicates a 5%...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

AI-guided analysis of human pancreatic islet sociology reveals distinct cell compositional changes in type 1 diabetes.

bioRxiv : the preprint server for biology·2026

Same author

Adaptive Fisher's method using weakly geometric grid for combining <i>p</i>-values with application to COVID-19 surveillance.

Journal of the Royal Statistical Society. Series C, Applied statistics·2026

Same author

Impact of sex chromosomes and gonad type in stress susceptibility in corticostriatal brain regions.

Proceedings of the National Academy of Sciences of the United States of America·2026

Same author

Unraveling Tissue-Specific Molecular Signatures and Convergent Pathway Enrichments in Suicidal Behavior.

bioRxiv : the preprint server for biology·2026

Same author

Quantitative and qualitative patient-reported analysis of misdiagnosis and/or late diagnosis of metastatic lobular cancer.

medRxiv : the preprint server for health sciences·2026

Same author

Benchmarking scRNA-seq Copy Number Inference: A Comprehensive Evaluation and Practitioner's Guide.

bioRxiv : the preprint server for biology·2026

Same journal

MCFST: Spatial domain identification method based on multi-view graph convolutional network and graph fusion network.

Bioinformatics (Oxford, England)·2026

Same journal

SpaBiT: Enhancing Spatial Transcriptomics Resolution via Bidirectional Attention Transformers.

Bioinformatics (Oxford, England)·2026

Same journal

EDEL: Enhancing Dense Retrievers for Curation of Biomedical Knowledge Bases.

Bioinformatics (Oxford, England)·2026

Same journal

Informative Relational Learning for Adverse Reaction Prediction with Enhanced Generalization to Novel Drugs.

Bioinformatics (Oxford, England)·2026

Same journal

An interpretable deep learning framework uncovers features governing CRISPR-Cas9 genome-editing efficiency.

Bioinformatics (Oxford, England)·2026

Same journal

3DICE: Interpretable 3D Cross-Modal Learning for Drug-Target Interaction Prediction and Large-Scale Drug Discovery.

Bioinformatics (Oxford, England)·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Apr 26, 2026

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

Bias correction for selecting the minimal-error classifier from many machine learning models.

Ying Ding¹, Shaowu Tang², Serena G Liao²

¹Joint Carnegie Mellon University-University of Pittsburgh Ph.D. Program in Computational Biology, Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, PA 15261, USA and Magee-Womens Research Institute, Pittsburgh, PA 15213, USA Joint Carnegie Mellon University-University of Pittsburgh Ph.D. Program in Computational Biology, Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, PA 15261, USA and Magee-Womens Research Institute, Pittsburgh, PA 15213, USA.

Bioinformatics (Oxford, England)

|August 3, 2014

Summary

This summary is machine-generated.

A new bias correction method using inverse power law (IPL) fitting improves machine learning classifier accuracy estimates in genomic studies. IPL outperforms existing methods, offering a practical way to assess if more data is needed for better predictions.

More Related Videos

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

Design and Analysis for Fall Detection System Simplification

Design and Analysis for Fall Detection System Simplification

Published on: April 6, 2020

Related Experiment Videos

Last Updated: Apr 26, 2026

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

Design and Analysis for Fall Detection System Simplification

Design and Analysis for Fall Detection System Simplification

Published on: April 6, 2020

Area of Science:

Genomic research
Machine learning
Biostatistics

Background:

Supervised machine learning is widely used in genomics for predictive classifier development.
Cross-validation is a common technique for estimating error rates when independent test datasets are unavailable.
A common practice of selecting the best performing model based on cross-validation can lead to selection bias, particularly in studies with moderate sample sizes.

Purpose of the Study:

To address the selection bias in machine learning classifier error rate estimation.
To propose and evaluate a novel bias correction method for cross-validation in genomic research.

Main Methods:

Developed a bias correction method based on fitting learning curves using the inverse power law (IPL).
Compared the proposed IPL method with existing techniques: nested cross-validation, weighted mean correction, and the Tibshirani-Tibshirani procedure.
Evaluated methods using simulation datasets, moderate-sized real datasets, and large breast cancer datasets.

Main Results:

The inverse power law (IPL) method demonstrated superior performance in bias correction compared to existing methods.
IPL exhibited a smaller variance in error estimates.
IPL offers the advantage of extrapolating error estimates for larger sample sizes, aiding decisions on sample recruitment.

Conclusions:

The proposed IPL bias correction method is effective and offers practical advantages for genomic research.
The 'MLbias' R package and source files are available for public use, promoting reproducibility and application.