Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Bias

Bias

Bias refers to any tendency that prevents a question from being considered unprejudiced. In research, bias occurs when one outcome or answer is selected or encouraged over others in sampling or testing. Bias can occur during any research phase, including study design, data collection, analysis, and publication.
In statistics, a sampling bias is created when a sample is collected from a population, and some members of the population are not as likely to be chosen as others (remember, each member...

Bias in Epidemiological Studies

Bias in Epidemiological Studies

Biases can arise at various stages of research, from study design and data collection to analysis and interpretation. Recognizing and addressing these biases is essential to ensure the validity and reliability of epidemiological findings.Broadly speaking, biases in epidemiology fall into three main categories: selection bias, information bias, and confounding. A more detailed description of possible biases is:

Statistical Inference Techniques in Hypothesis Testing: Parametric Versus Nonparametric Data

Statistical Inference Techniques in Hypothesis Testing: Parametric Versus Nonparametric Data

Statistical inference techniques, paramount in hypothesis testing, differentiate into two broad categories: parametric and nonparametric statistics.
Parametric statistics, as the name suggests, assumes that data follow a specific distribution, often a normal distribution. This assumption enables robust hypothesis testing and estimation. Parametric methods, like the Student's t-test or Goodness-of-fit test, are frequently employed in biostatistics due to their robustness. For instance, comparing...

Statistical Analysis: Overview

Statistical Analysis: Overview

When we take repeated measurements on the same or replicated samples, we will observe inconsistencies in the magnitude. These inconsistencies are called errors. To categorize and characterize these results and their errors, the researcher can use statistical analysis to determine the quality of the measurements and/or suitability of the methods.
One of the most commonly used statistical quantifiers is the mean, which is the ratio between the sum of the numerical values of all results and the...

Weighted Mean

Weighted Mean

While taking the arithmetic, geometric, or harmonic mean of a sample data set, equal importance is assigned to all the data points. However, all the values may not always be equally important in some data sets. An intrinsic bias might make it more important to give more weightage to specific values over others.
For example, consider the number of goals scored in the matches of a tournament. While computing the average number of goals scored in the tournament, it may be more important to...

Detection of Gross Error: The Q Test

Detection of Gross Error: The Q Test

When one or more data points appear far from the rest of the data, there is a need to determine whether they are outliers and whether they should be eliminated from the data set to ensure an accurate representation of the measured value. In many cases, outliers arise from gross errors (or human errors) and do not accurately reflect the underlying phenomenon. In some cases, however, these apparent outliers reflect true phenomenological differences. In these cases, we can use statistical methods...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Investigating the molecular mechanisms of resveratrol in treating diabetic foot ulcers: a comprehensive analysis of network pharmacology and experiment validation.

Frontiers in molecular biosciences·2025

Same author

AuNRs-PPARγmAb Induce Targeted Adipocyte Apoptosis Through Photothermal Effects for Effective Localized Fat Reduction.

International journal of nanomedicine·2025

Same author

Correction: facilitation of diabetic wound healing by far upstream element binding protein 1 through augmentation of dermal fibroblast activity.

Acta diabetologica·2025

Same author

Facilitation of diabetic wound healing by far upstream element binding protein 1 through augmentation of dermal fibroblast activity.

Acta diabetologica·2024

Same author

Pathway-based analyses of gene expression profiles at low doses of ionizing radiation.

Frontiers in bioinformatics·2024

Same author

Optimal decision-making in high-throughput virtual screening pipelines.

Patterns (New York, N.Y.)·2023

Same journal

conMItion: an R package adjusting confounding factors for associations in multi-omics.

Bioinformatics (Oxford, England)·2026

Same journal

SpaMFG: a Spatial Multi-omics Integration Method based on Feature Grouping.

Bioinformatics (Oxford, England)·2026

Same journal

CSCN: Inference of Cell-Specific Causal Networks Using Single-Cell RNA-Seq Data.

Bioinformatics (Oxford, England)·2026

Same journal

Sparse CCA-Based Mediation Analysis with High-Dimensional Exposures and Mediators.

Bioinformatics (Oxford, England)·2026

Same journal

Enhancing Cross-Context Generalization in Drug Perturbation Prediction with a Multimodal Conditional Diffusion Framework.

Bioinformatics (Oxford, England)·2026

Same journal

Primer Design through Submodular Function Estimation.

Bioinformatics (Oxford, England)·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Jun 19, 2026

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

Reporting bias when using real data sets to analyze classification performance.

Mohammadmahdi R Yousefi¹, Jianping Hua, Chao Sima

¹Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, USA.

Bioinformatics (Oxford, England)

|October 23, 2009

Summary

This summary is machine-generated.

Reporting only the best results from classification studies leads to biased performance metrics. Simulations show this reporting bias is significant, even when comparing top-performing datasets. Researchers should report all results for real data studies.

Related Experiment Videos

Last Updated: Jun 19, 2026

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

Area of Science:

Machine Learning
Bioinformatics
Statistical Modeling

Background:

Authors commonly propose new classification rules and demonstrate performance on high-dimensional, small-sample real datasets.
Variability in feature selection and error estimation leads to imprecise performance reporting.
Reporting only the best test results introduces bias relative to overall procedure performance.

Purpose of the Study:

To characterize and quantify reporting bias in classification performance.
To evaluate bias across different classification rules and feature selection methods.
To provide recommendations for more reliable reporting practices.

Main Methods:

Conducted a large simulation study using modeled and real data.
Computed reporting bias statistics for various scenarios.
Tested linear discriminant analysis (LDA) and 3-nearest-neighbor (3NN) classification rules.
Evaluated filter (t-test) and wrapper (sequential forward search) feature selection methods.

Main Results:

Reporting bias was generally large, often overriding significant performance differentials.
Bias was quantified as a function of the number of samples tested.
Results were consistent across different classification rules and feature selection techniques.
Bias was observed when reporting the best or second-best performing dataset.

Conclusions:

There is a substantial reporting bias when only top-performing datasets are presented.
A centralized database of datasets is recommended for comprehensive evaluation.
For studies using real data, results should be reported for all datasets tested.