Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Outliers and Influential Points01:08

Outliers and Influential Points

6.8K
An outlier is an observation of data that does not fit the rest of the data. It is sometimes called an extreme value. When you graph an outlier, it will appear not to fit the pattern of the graph. Some outliers are due to mistakes (for example, writing down 50 instead of 500), while others may indicate that something unusual is happening. Outliers are present far from the least squares line in the vertical direction. They have large "errors," where the "error" or residual is the...
6.8K
Quantifying and Rejecting Outliers: The Grubbs Test01:02

Quantifying and Rejecting Outliers: The Grubbs Test

5.0K
Sometimes, a data set can have a recorded numerical observation that greatly  deviates from the rest of the data. Assuming that the data is normally distributed, a statistical method called the Grubbs test can be used to determine whether the observation is truly an outlier.  To perform a two-tailed Grubbs test, first, calculate the absolute difference between the outlier and the mean. Then, calculate the ratio between this difference and the standard deviation of the sample. This...
5.0K
What Are Outliers?01:12

What Are Outliers?

5.7K
Outliers are observed data points that are far from the least squares line. They have unusual values and need to be examined carefully. Though an outlier may result from erroneous data, at other times, it may hold valuable information about the population under study and should be included in the data. Hence, it is crucial to examine what causes a data point to be an outlier.
The z score is used to find outliers or unusual values. It should be noted that any values beyond -2 and +2 are...
5.7K
Detection of Gross Error: The Q Test01:00

Detection of Gross Error: The Q Test

8.5K
When one or more data points appear far from the rest of the data, there is a need to determine whether they are outliers and whether they should be eliminated from the data set to ensure an accurate representation of the measured value. In many cases, outliers arise from gross errors (or human errors) and do not accurately reflect the underlying phenomenon. In some cases, however, these apparent outliers reflect true phenomenological differences. In these cases, we can use statistical methods...
8.5K
Unusual Results01:16

Unusual Results

4.2K
Unusual results are those that have a very low chance of occurring. Unusual results can be identified using probabilities and the range rule of thumb. In problems involving probability, unusual results can be observed in 2 instances – an unusually high number of successes or an unusually low number of successes.
According to the range rule of thumb, any value above or below two standard deviations, 2σ  from the mean, μ  is considered unusual.
Maximum unusual value =...
4.2K
Significance Testing: Overview01:04

Significance Testing: Overview

13.2K
Significance testing is a set of statistical methods used to test whether a claim about a parameter is valid. In analytical chemistry, significance testing is used primarily to determine whether the difference between two values comes from determinate or random errors. The effect of a particular change in the measurement protocol, analyst, or sample itself can cause a deviation from the expected result. In the case of a suspected deviation/outlier, we need to be able to confirm mathematically...
13.2K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Glycemic response trajectories on metformin monotherapy in real-world diabetes care.

medRxiv : the preprint server for health sciences·2026
Same author

Robust ranking of renewable energy alternatives handling uncertainty using novel hesitant bi-fuzzy MEREC-MOORA and Dombi aggregation approach.

Scientific reports·2026
Same author

The Impact of Social Vulnerability on Exercise Outcomes: A Longitudinal Study of Physical Function in Older People With HIV.

Journal of the International Association of Providers of AIDS Care·2026
Same author

Special issue: cell and gene causal inference in the design and analysis of gene therapy clinical trials.

Journal of biopharmaceutical statistics·2026
Same author

Mapping the last mile: Micro-stratification for sustained visceral leishmaniasis elimination in Bangladesh.

PLoS neglected tropical diseases·2026
Same author

The effects of high-intensity interval training versus continuous moderate-intensity exercise on body composition among older adults with HIV.

The journals of gerontology. Series A, Biological sciences and medical sciences·2026
Same journal

Widening Health Inequality and Causal Metabolic Drivers in Global Colorectal Cancer: A Multi-Dimensional Study.

Cancer informatics·2026
Same journal

GFAP-Dependent Transcriptional Dynamics and Cellular Heterogeneity in Primary, Recurrent, and Grade III Gliomas.

Cancer informatics·2026
Same journal

Translating Data Into Clinical Tools: An Integrative Strategy for Precision Biomarker Identification in Soft Tissue Sarcoma Diagnosis and Prognosis.

Cancer informatics·2026
Same journal

The MAPK Pathway Coordinates an Immunosuppressive Microenvironment in Colorectal Cancer: A Single-Cell Guided Prognostic Model.

Cancer informatics·2026
Same journal

Multi-Scale Cross-Attention Multiple Instance Learning Network for Automated Classification of Colorectal Polyps.

Cancer informatics·2026
Same journal

LEPR Contributes to Lung Squamous Cell Carcinoma: Insights From Mendelian Randomization and Experimental Studies.

Cancer informatics·2026
See all related articles

Related Experiment Video

Updated: Apr 20, 2026

A Cross-Disciplinary and Multi-Modal Experimental Design for Studying Near-Real-Time Authentic Examination Experiences
08:33

A Cross-Disciplinary and Multi-Modal Experimental Design for Studying Near-Real-Time Authentic Examination Experiences

Published on: September 4, 2019

7.6K

Unsupervised outlier profile analysis.

Debashis Ghosh1, Song Li2

  • 1Department of Biostatistics and Informatics, Colorado School of Public Health, Aurora, CO, USA.

Cancer Informatics
|December 3, 2014
PubMed
Summary
This summary is machine-generated.

This study introduces novel statistical methods for identifying outlier genes in high-throughput genomic data, offering an unsupervised approach for analyzing gene expression patterns and improving cancer research findings.

Keywords:
biomarkersgenomic data integrationheterogeneitymicroarraymixture modeltumor subtypes

Related Experiment Videos

Last Updated: Apr 20, 2026

A Cross-Disciplinary and Multi-Modal Experimental Design for Studying Near-Real-Time Authentic Examination Experiences
08:33

A Cross-Disciplinary and Multi-Modal Experimental Design for Studying Near-Real-Time Authentic Examination Experiences

Published on: September 4, 2019

7.6K

Area of Science:

  • Genomics
  • Statistical Bioinformatics
  • Biostatistics

Background:

  • High-throughput genomic data analysis often relies on differential expression to identify significant genes.
  • Current methods primarily focus on mean expression changes, potentially overlooking complex expression patterns.
  • Existing outlier detection methods may have limitations, particularly with continuous genomic data.

Purpose of the Study:

  • To develop and evaluate new statistical methods for unsupervised outlier gene detection in high-throughput genomic data.
  • To adapt C(α) tests for outlier expression analysis, addressing limitations with continuous data.
  • To extend methods for analyzing matched samples across multiple genomic data platforms.

Main Methods:

  • Exploration of C(α) tests for outlier expression analysis.
  • Development of novel unsupervised statistics analogous to existing outlier profile analysis.
  • Simulation studies to assess the performance of proposed methods.
  • Application of a bivariate extension to analyze matched-sample data from two platforms.

Main Results:

  • Proposed methods provide an unsupervised alternative for identifying outlier genes.
  • Simulation studies demonstrate the utility of the new statistics.
  • Bivariate extension successfully accommodates multi-platform data from matched samples.

Conclusions:

  • The developed statistical approaches offer a valuable tool for identifying outlier genes in genomic studies.
  • These methods enhance the analysis of complex gene expression patterns, particularly in cancer research.
  • The bivariate extension facilitates integrated analysis of multi-platform genomic data.