Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

What Are Outliers?01:12

What Are Outliers?

5.6K
Outliers are observed data points that are far from the least squares line. They have unusual values and need to be examined carefully. Though an outlier may result from erroneous data, at other times, it may hold valuable information about the population under study and should be included in the data. Hence, it is crucial to examine what causes a data point to be an outlier.
The z score is used to find outliers or unusual values. It should be noted that any values beyond -2 and +2 are...
5.6K
Outliers and Influential Points01:08

Outliers and Influential Points

6.7K
An outlier is an observation of data that does not fit the rest of the data. It is sometimes called an extreme value. When you graph an outlier, it will appear not to fit the pattern of the graph. Some outliers are due to mistakes (for example, writing down 50 instead of 500), while others may indicate that something unusual is happening. Outliers are present far from the least squares line in the vertical direction. They have large "errors," where the "error" or residual is the...
6.7K
Quantifying and Rejecting Outliers: The Grubbs Test01:02

Quantifying and Rejecting Outliers: The Grubbs Test

4.5K
Sometimes, a data set can have a recorded numerical observation that greatly  deviates from the rest of the data. Assuming that the data is normally distributed, a statistical method called the Grubbs test can be used to determine whether the observation is truly an outlier.  To perform a two-tailed Grubbs test, first, calculate the absolute difference between the outlier and the mean. Then, calculate the ratio between this difference and the standard deviation of the sample. This...
4.5K
Detection of Gross Error: The Q Test01:00

Detection of Gross Error: The Q Test

7.3K
When one or more data points appear far from the rest of the data, there is a need to determine whether they are outliers and whether they should be eliminated from the data set to ensure an accurate representation of the measured value. In many cases, outliers arise from gross errors (or human errors) and do not accurately reflect the underlying phenomenon. In some cases, however, these apparent outliers reflect true phenomenological differences. In these cases, we can use statistical methods...
7.3K
Modified Boxplots00:57

Modified Boxplots

11.7K
A standard box and whisker plot informs us about the spread of the data in a given sample. One can identify the minimum value, maximum value, first quartile value, second quartile or median value, and third quartile.
However, the box plot does not tell the reader about outliers - values that lie far from the center of the data. We can modify the standard box and whisker plot to identify the outliers and visualize the actual spread of the data in a sample.
Initially, we calculate the adjusted...
11.7K
Significance Testing: Overview01:04

Significance Testing: Overview

13.0K
Significance testing is a set of statistical methods used to test whether a claim about a parameter is valid. In analytical chemistry, significance testing is used primarily to determine whether the difference between two values comes from determinate or random errors. The effect of a particular change in the measurement protocol, analyst, or sample itself can cause a deviation from the expected result. In the case of a suspected deviation/outlier, we need to be able to confirm mathematically...
13.0K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Global Perspectives on Person-Centered Dementia Care: Results of an International Survey.

Journal of applied gerontology : the official journal of the Southern Gerontological Society·2026
Same author

Sign Language Interpreter Well-Being.

American journal of health promotion : AJHP·2026
Same author

ACKR1/Duffy-null genotype testing for clozapine: A guideline developed by the UK Centre of Excellence in Regulatory Science and Innovation in Pharmacogenomics (CERSI-PGx).

British journal of clinical pharmacology·2026
Same author

Comparison of the somatic mutations in breast carcinomas in sporadic and BRCA1 carrier patients through targeted next generation sequencing.

Human pathology·2026
Same author

Social determinants of health screening and interventions in neonatal care pathways (NICU to follow-up): a scoping review.

European journal of pediatrics·2026
Same author

Soluble epoxide hydrolase inhibition restores pro-resolving lipid mediators and reduces inflammation in localized provoked vulvodynia.

Frontiers in pharmacology·2026
Same journal

Suboptimal Comparison of Partitions.

Journal of classification·2025
Same journal

Local and Overall Deviance R-Squared Measures for Mixtures of Generalized Linear Models.

Journal of classification·2023
Same journal

Zero-Inflated Time Series Clustering Via Ensemble Thick-Pen Transform.

Journal of classification·2023
Same journal

DDCAL: Evenly Distributing Data into Low Variance Clusters Based on Iterative Feature Scaling.

Journal of classification·2023
Same journal

Similarity-Reduced Diversities: the Effective Entropy and the Reduced Entropy.

Journal of classification·2022
Same journal

Editorial: Journal of Classification Vol. 38-3.

Journal of classification·2021
See all related articles

Related Experiment Video

Updated: Mar 26, 2026

Competitive Genomic Screens of Barcoded Yeast Libraries
11:59

Competitive Genomic Screens of Barcoded Yeast Libraries

Published on: August 11, 2011

18.9K

Outlier Identification in Model-Based Cluster Analysis.

Katie Evans1, Tanzy Love2, Sally W Thurston2

  • 1Dupont, DuET Applied Statistics, Delaware USA.

Journal of Classification
|January 26, 2016
PubMed
Summary
This summary is machine-generated.

This study introduces a novel method for identifying outlier observations in normal-mixture model-based clustering. The approach effectively detects true outliers, improving clustering accuracy and robustness.

Keywords:
Influential pointsMCLUSTNational Hockey LeagueNormal-mixture modelsPrior

More Related Videos

Large-scale Reconstructions and Independent, Unbiased Clustering Based on Morphological Metrics to Classify Neurons in Selective Populations
12:27

Large-scale Reconstructions and Independent, Unbiased Clustering Based on Morphological Metrics to Classify Neurons in Selective Populations

Published on: February 15, 2017

7.4K
Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances
07:35

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

8.1K

Related Experiment Videos

Last Updated: Mar 26, 2026

Competitive Genomic Screens of Barcoded Yeast Libraries
11:59

Competitive Genomic Screens of Barcoded Yeast Libraries

Published on: August 11, 2011

18.9K
Large-scale Reconstructions and Independent, Unbiased Clustering Based on Morphological Metrics to Classify Neurons in Selective Populations
12:27

Large-scale Reconstructions and Independent, Unbiased Clustering Based on Morphological Metrics to Classify Neurons in Selective Populations

Published on: February 15, 2017

7.4K
Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances
07:35

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

8.1K

Area of Science:

  • Statistics
  • Data Mining
  • Machine Learning

Background:

  • Model-based clustering using normal-mixture models can be sensitive to outlying observations, potentially distorting cluster structure and count.
  • Existing methods may struggle with accurate outlier identification in complex datasets.

Purpose of the Study:

  • To develop and validate a robust method for identifying outlier observations within normal-mixture model-based clusters.
  • To enhance the reliability of clustering results by mitigating the influence of outliers.

Main Methods:

  • The proposed method identifies outliers based on minimal cluster membership proportion or significant changes in cluster-specific variance.
  • Utilizes the MCLUST R package with a modified prior for cluster-specific variance to prevent estimation degeneracies.
  • Evaluated through simulation studies and comparison with existing outlier detection techniques.

Main Results:

  • The developed method demonstrates high accuracy in detecting true outliers while minimizing false positives.
  • Outperforms other approaches in outlier detection across various simulated scenarios.
  • Applied to National Hockey League data, showing comparable results to published findings.

Conclusions:

  • The novel outlier identification method significantly improves the robustness of normal-mixture model-based clustering.
  • Offers a valuable tool for data analysis where outlier detection is critical for accurate cluster interpretation.
  • The modified prior for variance estimation addresses key challenges in model-based clustering procedures.