Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Outliers and Influential Points

Outliers and Influential Points

An outlier is an observation of data that does not fit the rest of the data. It is sometimes called an extreme value. When you graph an outlier, it will appear not to fit the pattern of the graph. Some outliers are due to mistakes (for example, writing down 50 instead of 500), while others may indicate that something unusual is happening. Outliers are present far from the least squares line in the vertical direction. They have large "errors," where the "error" or residual is the...

Quantifying and Rejecting Outliers: The Grubbs Test

Quantifying and Rejecting Outliers: The Grubbs Test

Sometimes, a data set can have a recorded numerical observation that greatly deviates from the rest of the data. Assuming that the data is normally distributed, a statistical method called the Grubbs test can be used to determine whether the observation is truly an outlier. To perform a two-tailed Grubbs test, first, calculate the absolute difference between the outlier and the mean. Then, calculate the ratio between this difference and the standard deviation of the sample. This...

What Are Outliers?

What Are Outliers?

Outliers are observed data points that are far from the least squares line. They have unusual values and need to be examined carefully. Though an outlier may result from erroneous data, at other times, it may hold valuable information about the population under study and should be included in the data. Hence, it is crucial to examine what causes a data point to be an outlier.
The z score is used to find outliers or unusual values. It should be noted that any values beyond -2 and +2 are...

Detection of Gross Error: The Q Test

Detection of Gross Error: The Q Test

When one or more data points appear far from the rest of the data, there is a need to determine whether they are outliers and whether they should be eliminated from the data set to ensure an accurate representation of the measured value. In many cases, outliers arise from gross errors (or human errors) and do not accurately reflect the underlying phenomenon. In some cases, however, these apparent outliers reflect true phenomenological differences. In these cases, we can use statistical methods...

Unusual Results

Unusual Results

Unusual results are those that have a very low chance of occurring. Unusual results can be identified using probabilities and the range rule of thumb. In problems involving probability, unusual results can be observed in 2 instances – an unusually high number of successes or an unusually low number of successes.
According to the range rule of thumb, any value above or below two standard deviations, 2σ from the mean, μ is considered unusual.
Maximum unusual value =...

Significance Testing: Overview

Significance Testing: Overview

Significance testing is a set of statistical methods used to test whether a claim about a parameter is valid. In analytical chemistry, significance testing is used primarily to determine whether the difference between two values comes from determinate or random errors. The effect of a particular change in the measurement protocol, analyst, or sample itself can cause a deviation from the expected result. In the case of a suspected deviation/outlier, we need to be able to confirm mathematically...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Glycemic response trajectories on metformin monotherapy in real-world diabetes care.

medRxiv : the preprint server for health sciences·2026

Same author

Robust ranking of renewable energy alternatives handling uncertainty using novel hesitant bi-fuzzy MEREC-MOORA and Dombi aggregation approach.

Scientific reports·2026

Same author

The Impact of Social Vulnerability on Exercise Outcomes: A Longitudinal Study of Physical Function in Older People With HIV.

Journal of the International Association of Providers of AIDS Care·2026

Same author

Special issue: cell and gene causal inference in the design and analysis of gene therapy clinical trials.

Journal of biopharmaceutical statistics·2026

Same author

Mapping the last mile: Micro-stratification for sustained visceral leishmaniasis elimination in Bangladesh.

PLoS neglected tropical diseases·2026

Same author

The effects of high-intensity interval training versus continuous moderate-intensity exercise on body composition among older adults with HIV.

The journals of gerontology. Series A, Biological sciences and medical sciences·2026

Same journal

Widening Health Inequality and Causal Metabolic Drivers in Global Colorectal Cancer: A Multi-Dimensional Study.

Cancer informatics·2026

Same journal

GFAP-Dependent Transcriptional Dynamics and Cellular Heterogeneity in Primary, Recurrent, and Grade III Gliomas.

Cancer informatics·2026

Same journal

Translating Data Into Clinical Tools: An Integrative Strategy for Precision Biomarker Identification in Soft Tissue Sarcoma Diagnosis and Prognosis.

Cancer informatics·2026

Same journal

The MAPK Pathway Coordinates an Immunosuppressive Microenvironment in Colorectal Cancer: A Single-Cell Guided Prognostic Model.

Cancer informatics·2026

Same journal

Multi-Scale Cross-Attention Multiple Instance Learning Network for Automated Classification of Colorectal Polyps.

Cancer informatics·2026

Same journal

LEPR Contributes to Lung Squamous Cell Carcinoma: Insights From Mendelian Randomization and Experimental Studies.

Cancer informatics·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Apr 20, 2026

A Cross-Disciplinary and Multi-Modal Experimental Design for Studying Near-Real-Time Authentic Examination Experiences

A Cross-Disciplinary and Multi-Modal Experimental Design for Studying Near-Real-Time Authentic Examination Experiences

Published on: September 4, 2019

Unsupervised outlier profile analysis.

Debashis Ghosh¹, Song Li²

¹Department of Biostatistics and Informatics, Colorado School of Public Health, Aurora, CO, USA.

Cancer Informatics

|December 3, 2014

Summary

This summary is machine-generated.

This study introduces novel statistical methods for identifying outlier genes in high-throughput genomic data, offering an unsupervised approach for analyzing gene expression patterns and improving cancer research findings.

Keywords:

biomarkers genomic data integration heterogeneity microarray mixture model tumor subtypes

Related Experiment Videos

Last Updated: Apr 20, 2026

A Cross-Disciplinary and Multi-Modal Experimental Design for Studying Near-Real-Time Authentic Examination Experiences

A Cross-Disciplinary and Multi-Modal Experimental Design for Studying Near-Real-Time Authentic Examination Experiences

Published on: September 4, 2019

Area of Science:

Genomics
Statistical Bioinformatics
Biostatistics

Background:

High-throughput genomic data analysis often relies on differential expression to identify significant genes.
Current methods primarily focus on mean expression changes, potentially overlooking complex expression patterns.
Existing outlier detection methods may have limitations, particularly with continuous genomic data.

Purpose of the Study:

To develop and evaluate new statistical methods for unsupervised outlier gene detection in high-throughput genomic data.
To adapt C(α) tests for outlier expression analysis, addressing limitations with continuous data.
To extend methods for analyzing matched samples across multiple genomic data platforms.

Main Methods:

Exploration of C(α) tests for outlier expression analysis.
Development of novel unsupervised statistics analogous to existing outlier profile analysis.
Simulation studies to assess the performance of proposed methods.
Application of a bivariate extension to analyze matched-sample data from two platforms.

Main Results:

Proposed methods provide an unsupervised alternative for identifying outlier genes.
Simulation studies demonstrate the utility of the new statistics.
Bivariate extension successfully accommodates multi-platform data from matched samples.

Conclusions:

The developed statistical approaches offer a valuable tool for identifying outlier genes in genomic studies.
These methods enhance the analysis of complex gene expression patterns, particularly in cancer research.
The bivariate extension facilitates integrated analysis of multi-platform genomic data.