Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Quantifying and Rejecting Outliers: The Grubbs Test01:02

Quantifying and Rejecting Outliers: The Grubbs Test

1.8K
Sometimes, a data set can have a recorded numerical observation that greatly  deviates from the rest of the data. Assuming that the data is normally distributed, a statistical method called the Grubbs test can be used to determine whether the observation is truly an outlier.  To perform a two-tailed Grubbs test, first, calculate the absolute difference between the outlier and the mean. Then, calculate the ratio between this difference and the standard deviation of the sample. This...
1.8K
Outliers and Influential Points01:08

Outliers and Influential Points

4.2K
An outlier is an observation of data that does not fit the rest of the data. It is sometimes called an extreme value. When you graph an outlier, it will appear not to fit the pattern of the graph. Some outliers are due to mistakes (for example, writing down 50 instead of 500), while others may indicate that something unusual is happening. Outliers are present far from the least squares line in the vertical direction. They have large "errors," where the "error" or residual is the...
4.2K
What Are Outliers?01:12

What Are Outliers?

4.0K
Outliers are observed data points that are far from the least squares line. They have unusual values and need to be examined carefully. Though an outlier may result from erroneous data, at other times, it may hold valuable information about the population under study and should be included in the data. Hence, it is crucial to examine what causes a data point to be an outlier.
The z score is used to find outliers or unusual values. It should be noted that any values beyond -2 and +2 are...
4.0K
Detection of Gross Error: The Q Test01:00

Detection of Gross Error: The Q Test

6.3K
When one or more data points appear far from the rest of the data, there is a need to determine whether they are outliers and whether they should be eliminated from the data set to ensure an accurate representation of the measured value. In many cases, outliers arise from gross errors (or human errors) and do not accurately reflect the underlying phenomenon. In some cases, however, these apparent outliers reflect true phenomenological differences. In these cases, we can use statistical methods...
6.3K
Unusual Results01:16

Unusual Results

3.2K
Unusual results are those that have a very low chance of occurring. Unusual results can be identified using probabilities and the range rule of thumb. In problems involving probability, unusual results can be observed in 2 instances – an unusually high number of successes or an unusually low number of successes.
According to the range rule of thumb, any value above or below two standard deviations, 2σ  from the mean, μ  is considered unusual.
Maximum unusual value =...
3.2K
Expected Frequencies in Goodness-of-Fit Tests01:19

Expected Frequencies in Goodness-of-Fit Tests

2.6K
A goodness-of-fit test is conducted to determine whether the observed frequency values are statistically similar to the frequencies expected for the dataset. Suppose the expected frequencies for a dataset are equal such as when predicting the frequency of any number appearing when casting a die. In that case, the expected frequency is the ratio of the total number of observations (n)  to the number of categories (k).
2.6K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Integrative multi-omics analysis of growth plate regulation underlying body size in miniature pigs.

Communications biology·2026
Same author

scTrimClust: a fast approach to robust scRNA-seq analysis using trimmed cell clusters.

Bioinformatics advances·2026
Same author

FLT3-ITD Induces CMTM6 and Enhances Immune Escape in Acute Myeloid Leukemia.

Cancer research·2025
Same author

Application of Bovine Nasal Epithelial Cells as an In Vitro Model for Studying Viral Infection in the Upper Respiratory Tract.

Viruses·2025
Same author

Innate Immune Response Against Batai Virus, Bunyamwera Virus, and Their Reassortants.

Viruses·2025
Same author

Combined Analysis of Multi-Study miRNA and mRNA Expression Data Shows Overlap of Selected miRNAs Involved in West Nile Virus Infections.

Genes·2024
Same journal

Tissue MicroRNAs in Arrhythmogenic Cardiomyopathy: A Systematic Review of Studies in Human Myocardium and Animal Models with Implications for Post-Mortem Molecular Diagnostics.

Genes·2026
Same journal

Genetic Variants and Dental Caries Susceptibility: An Umbrella Review and Multilevel Meta-Analysis.

Genes·2026
Same journal

Generative AI and Language Models in Human Genetics and Health: From Variant Interpretation to Clinical Decision Support.

Genes·2026
Same journal

Familial White-Sutton Syndrome Caused by a Pathogenic POGZ p.Arg508* Variant: Intrafamilial Variability from Childhood to Adulthood.

Genes·2026
Same journal

Genetic Influence on LDL-Cholesterol Levels: Role of Polygenic Risk Scores and Lp(a) Beyond Monogenic Hypercholesterolemia.

Genes·2026
Same journal

THBS1 as a Key Regulator of Myoblasts: Validation of Its Inhibitory Roles in Skeletal Muscle Development.

Genes·2026
See all related articles

Related Experiment Video

Updated: Aug 9, 2025

Author Spotlight: Impact of Intergenic Interactions on Disease-Identifying Dark Biomarkers
03:37

Author Spotlight: Impact of Intergenic Interactions on Disease-Identifying Dark Biomarkers

Published on: March 1, 2024

827

Assessing Outlier Probabilities in Transcriptomics Data When Evaluating a Classifier.

Magdalena Kircher1, Josefin Säurich1, Michael Selle1

  • 1Institute for Animal Breeding and Genetics, University of Veterinary Medicine Hannover, D-30559 Hannover, Germany.

Genes
|February 25, 2023
PubMed
Summary
This summary is machine-generated.

Outliers in transcriptomics data significantly impact classifier performance. Removing outliers generally improves accuracy, ensuring reliable models for clinical use.

Keywords:
bagplotbootstrapoutlier detectionoutlier probabilityrobust learningtranscriptomics data

More Related Videos

Optimization for Sequencing and Analysis of Degraded FFPE-RNA Samples
07:30

Optimization for Sequencing and Analysis of Degraded FFPE-RNA Samples

Published on: June 8, 2020

12.2K
Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances
07:35

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

7.6K

Related Experiment Videos

Last Updated: Aug 9, 2025

Author Spotlight: Impact of Intergenic Interactions on Disease-Identifying Dark Biomarkers
03:37

Author Spotlight: Impact of Intergenic Interactions on Disease-Identifying Dark Biomarkers

Published on: March 1, 2024

827
Optimization for Sequencing and Analysis of Degraded FFPE-RNA Samples
07:30

Optimization for Sequencing and Analysis of Degraded FFPE-RNA Samples

Published on: June 8, 2020

12.2K
Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances
07:35

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

7.6K

Area of Science:

  • Bioinformatics
  • Computational Biology
  • Genomics

Background:

  • Outliers in transcriptomics datasets can lead to inaccurate classifier performance estimates.
  • This can result in unreliable models that fail to perform consistently on new data, questioning their clinical utility.

Purpose of the Study:

  • To investigate the impact of outliers on transcriptomics classifier performance.
  • To develop and apply a robust method for outlier detection and removal in transcriptomics data analysis.

Main Methods:

  • Utilized simulated gene expression data with artificial outliers and two real-world datasets.
  • Employed a bootstrap procedure with two outlier detection methods to estimate sample outlier probabilities.
  • Evaluated classifier performance using cross-validation before and after outlier removal.

Main Results:

  • Outlier removal notably altered classification performance across datasets.
  • In most cases, removing outliers led to improved classification results.
  • The study highlights the variability in classifier performance when outliers are present.

Conclusions:

  • Always report transcriptomics classifier performance both with and without outliers.
  • This practice provides a comprehensive view of model robustness and prevents the reporting of potentially non-reproducible models.
  • Ensures greater reliability for classifiers intended for clinical diagnoses.