Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

What Are Outliers?

What Are Outliers?

Outliers are observed data points that are far from the least squares line. They have unusual values and need to be examined carefully. Though an outlier may result from erroneous data, at other times, it may hold valuable information about the population under study and should be included in the data. Hence, it is crucial to examine what causes a data point to be an outlier.
The z score is used to find outliers or unusual values. It should be noted that any values beyond -2 and +2 are...

Outliers and Influential Points

Outliers and Influential Points

An outlier is an observation of data that does not fit the rest of the data. It is sometimes called an extreme value. When you graph an outlier, it will appear not to fit the pattern of the graph. Some outliers are due to mistakes (for example, writing down 50 instead of 500), while others may indicate that something unusual is happening. Outliers are present far from the least squares line in the vertical direction. They have large "errors," where the "error" or residual is the...

Quantifying and Rejecting Outliers: The Grubbs Test

Quantifying and Rejecting Outliers: The Grubbs Test

Sometimes, a data set can have a recorded numerical observation that greatly deviates from the rest of the data. Assuming that the data is normally distributed, a statistical method called the Grubbs test can be used to determine whether the observation is truly an outlier. To perform a two-tailed Grubbs test, first, calculate the absolute difference between the outlier and the mean. Then, calculate the ratio between this difference and the standard deviation of the sample. This...

Detection of Gross Error: The Q Test

Detection of Gross Error: The Q Test

When one or more data points appear far from the rest of the data, there is a need to determine whether they are outliers and whether they should be eliminated from the data set to ensure an accurate representation of the measured value. In many cases, outliers arise from gross errors (or human errors) and do not accurately reflect the underlying phenomenon. In some cases, however, these apparent outliers reflect true phenomenological differences. In these cases, we can use statistical methods...

Modified Boxplots

Modified Boxplots

A standard box and whisker plot informs us about the spread of the data in a given sample. One can identify the minimum value, maximum value, first quartile value, second quartile or median value, and third quartile.
However, the box plot does not tell the reader about outliers - values that lie far from the center of the data. We can modify the standard box and whisker plot to identify the outliers and visualize the actual spread of the data in a sample.
Initially, we calculate the adjusted...

Significance Testing: Overview

Significance Testing: Overview

Significance testing is a set of statistical methods used to test whether a claim about a parameter is valid. In analytical chemistry, significance testing is used primarily to determine whether the difference between two values comes from determinate or random errors. The effect of a particular change in the measurement protocol, analyst, or sample itself can cause a deviation from the expected result. In the case of a suspected deviation/outlier, we need to be able to confirm mathematically...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Global Perspectives on Person-Centered Dementia Care: Results of an International Survey.

Journal of applied gerontology : the official journal of the Southern Gerontological Society·2026

Same author

Sign Language Interpreter Well-Being.

American journal of health promotion : AJHP·2026

Same author

ACKR1/Duffy-null genotype testing for clozapine: A guideline developed by the UK Centre of Excellence in Regulatory Science and Innovation in Pharmacogenomics (CERSI-PGx).

British journal of clinical pharmacology·2026

Same author

Comparison of the somatic mutations in breast carcinomas in sporadic and BRCA1 carrier patients through targeted next generation sequencing.

Human pathology·2026

Same author

Social determinants of health screening and interventions in neonatal care pathways (NICU to follow-up): a scoping review.

European journal of pediatrics·2026

Same author

Soluble epoxide hydrolase inhibition restores pro-resolving lipid mediators and reduces inflammation in localized provoked vulvodynia.

Frontiers in pharmacology·2026

Same journal

Suboptimal Comparison of Partitions.

Journal of classification·2025

Same journal

Local and Overall Deviance R-Squared Measures for Mixtures of Generalized Linear Models.

Journal of classification·2023

Same journal

Zero-Inflated Time Series Clustering Via Ensemble Thick-Pen Transform.

Journal of classification·2023

Same journal

DDCAL: Evenly Distributing Data into Low Variance Clusters Based on Iterative Feature Scaling.

Journal of classification·2023

Same journal

Similarity-Reduced Diversities: the Effective Entropy and the Reduced Entropy.

Journal of classification·2022

Same journal

Editorial: Journal of Classification Vol. 38-3.

Journal of classification·2021

See all related articles

Search research articles

Related Experiment Video

Updated: Mar 26, 2026

Competitive Genomic Screens of Barcoded Yeast Libraries

Competitive Genomic Screens of Barcoded Yeast Libraries

Published on: August 11, 2011

Outlier Identification in Model-Based Cluster Analysis.

Katie Evans¹, Tanzy Love², Sally W Thurston²

¹Dupont, DuET Applied Statistics, Delaware USA.

Journal of Classification

|January 26, 2016

Summary

This summary is machine-generated.

This study introduces a novel method for identifying outlier observations in normal-mixture model-based clustering. The approach effectively detects true outliers, improving clustering accuracy and robustness.

Keywords:

Influential points MCLUST National Hockey League Normal-mixture models Prior

More Related Videos

Large-scale Reconstructions and Independent, Unbiased Clustering Based on Morphological Metrics to Classify Neurons in Selective Populations

Large-scale Reconstructions and Independent, Unbiased Clustering Based on Morphological Metrics to Classify Neurons in Selective Populations

Published on: February 15, 2017

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

Related Experiment Videos

Last Updated: Mar 26, 2026

Competitive Genomic Screens of Barcoded Yeast Libraries

Competitive Genomic Screens of Barcoded Yeast Libraries

Published on: August 11, 2011

Large-scale Reconstructions and Independent, Unbiased Clustering Based on Morphological Metrics to Classify Neurons in Selective Populations

Large-scale Reconstructions and Independent, Unbiased Clustering Based on Morphological Metrics to Classify Neurons in Selective Populations

Published on: February 15, 2017

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

Area of Science:

Statistics
Data Mining
Machine Learning

Background:

Model-based clustering using normal-mixture models can be sensitive to outlying observations, potentially distorting cluster structure and count.
Existing methods may struggle with accurate outlier identification in complex datasets.

Purpose of the Study:

To develop and validate a robust method for identifying outlier observations within normal-mixture model-based clusters.
To enhance the reliability of clustering results by mitigating the influence of outliers.

Main Methods:

The proposed method identifies outliers based on minimal cluster membership proportion or significant changes in cluster-specific variance.
Utilizes the MCLUST R package with a modified prior for cluster-specific variance to prevent estimation degeneracies.
Evaluated through simulation studies and comparison with existing outlier detection techniques.

Main Results:

The developed method demonstrates high accuracy in detecting true outliers while minimizing false positives.
Outperforms other approaches in outlier detection across various simulated scenarios.
Applied to National Hockey League data, showing comparable results to published findings.

Conclusions:

The novel outlier identification method significantly improves the robustness of normal-mixture model-based clustering.
Offers a valuable tool for data analysis where outlier detection is critical for accurate cluster interpretation.
The modified prior for variance estimation addresses key challenges in model-based clustering procedures.