Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Quantifying and Rejecting Outliers: The Grubbs Test

Quantifying and Rejecting Outliers: The Grubbs Test

Sometimes, a data set can have a recorded numerical observation that greatly deviates from the rest of the data. Assuming that the data is normally distributed, a statistical method called the Grubbs test can be used to determine whether the observation is truly an outlier. To perform a two-tailed Grubbs test, first, calculate the absolute difference between the outlier and the mean. Then, calculate the ratio between this difference and the standard deviation of the sample. This...

Outliers and Influential Points

Outliers and Influential Points

An outlier is an observation of data that does not fit the rest of the data. It is sometimes called an extreme value. When you graph an outlier, it will appear not to fit the pattern of the graph. Some outliers are due to mistakes (for example, writing down 50 instead of 500), while others may indicate that something unusual is happening. Outliers are present far from the least squares line in the vertical direction. They have large "errors," where the "error" or residual is the...

Detection of Gross Error: The Q Test

Detection of Gross Error: The Q Test

When one or more data points appear far from the rest of the data, there is a need to determine whether they are outliers and whether they should be eliminated from the data set to ensure an accurate representation of the measured value. In many cases, outliers arise from gross errors (or human errors) and do not accurately reflect the underlying phenomenon. In some cases, however, these apparent outliers reflect true phenomenological differences. In these cases, we can use statistical methods...

What Are Outliers?

What Are Outliers?

Outliers are observed data points that are far from the least squares line. They have unusual values and need to be examined carefully. Though an outlier may result from erroneous data, at other times, it may hold valuable information about the population under study and should be included in the data. Hence, it is crucial to examine what causes a data point to be an outlier.
The z score is used to find outliers or unusual values. It should be noted that any values beyond -2 and +2 are...

Modified Boxplots

Modified Boxplots

A standard box and whisker plot informs us about the spread of the data in a given sample. One can identify the minimum value, maximum value, first quartile value, second quartile or median value, and third quartile.
However, the box plot does not tell the reader about outliers - values that lie far from the center of the data. We can modify the standard box and whisker plot to identify the outliers and visualize the actual spread of the data in a sample.
Initially, we calculate the adjusted...

Unusual Results

Unusual Results

Unusual results are those that have a very low chance of occurring. Unusual results can be identified using probabilities and the range rule of thumb. In problems involving probability, unusual results can be observed in 2 instances – an unusually high number of successes or an unusually low number of successes.
According to the range rule of thumb, any value above or below two standard deviations, 2σ from the mean, μ is considered unusual.
Maximum unusual value =...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Orthonormal pairwise logratio selection (OPALS) algorithm for compositional data analysis in high dimensions.

Bioinformatics advances·2025

Same author

The impact of misclassifications and outliers on imputation methods.

Journal of applied statistics·2024

Same author

Visual Parameter Selection for Spatial Blind Source Separation.

Computer graphics forum : journal of the European Association for Computer Graphics·2022

Same author

Privacy of Study Participants in Open-access Health and Demographic Surveillance System Data: Requirements Analysis for Data Anonymization.

JMIR public health and surveillance·2022

Same author

Robust principal component analysis for compositional tables.

Journal of applied statistics·2022

Same author

Statistical Analysis of Chemical Element Compositions in Food Science: Problems and Possibilities.

Molecules (Basel, Switzerland)·2021

Same journal

Elastic functional Cox regression model with shape predictors.

Journal of applied statistics·2026

Same journal

An improved two-stage binary relevance method for multilabel classification.

Journal of applied statistics·2026

Same journal

Classification of multivariate functional data with an application to ADHD fMRI data.

Journal of applied statistics·2026

Same journal

Assessing the performance of longitudinal T-lymphocytes as biomarkers of immune recovery in HIV-infected children with or without TB co-infection.

Journal of applied statistics·2026

Same journal

Sparse long-only Markowitz portfolio optimization.

Journal of applied statistics·2026

Same journal

Homogeneity of multinomial populations when data are classified into a large number of groups.

Journal of applied statistics·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Sep 8, 2025

A Cross-Disciplinary and Multi-Modal Experimental Design for Studying Near-Real-Time Authentic Examination Experiences

A Cross-Disciplinary and Multi-Modal Experimental Design for Studying Near-Real-Time Authentic Examination Experiences

Published on: September 4, 2019

Evaluation of robust outlier detection methods for zero-inflated complex data.

M Templ^1,2, J Gussenbauer³, P Filzmoser²

¹Zurich University of Applied Sciences, Winterthur, Switzerland.

Journal of Applied Statistics

|June 16, 2022

Summary

This summary is machine-generated.

Robust multivariate outlier detection methods effectively identify true outliers in complex datasets with zeros and compositional variables. These advanced techniques outperform univariate methods, improving data preprocessing for economic indicators like Purchase Power Parity.

Keywords:

62H86 Outlier detection household expenditures robust methods zeros

More Related Videos

An R-Based Landscape Validation of a Competing Risk Model

An R-Based Landscape Validation of a Competing Risk Model

Published on: September 16, 2022

Detection of Architectural Distortion in Prior Mammograms via Analysis of Oriented Patterns

Detection of Architectural Distortion in Prior Mammograms via Analysis of Oriented Patterns

Published on: August 30, 2013

Related Experiment Videos

Last Updated: Sep 8, 2025

A Cross-Disciplinary and Multi-Modal Experimental Design for Studying Near-Real-Time Authentic Examination Experiences

A Cross-Disciplinary and Multi-Modal Experimental Design for Studying Near-Real-Time Authentic Examination Experiences

Published on: September 4, 2019

An R-Based Landscape Validation of a Competing Risk Model

An R-Based Landscape Validation of a Competing Risk Model

Published on: September 16, 2022

Detection of Architectural Distortion in Prior Mammograms via Analysis of Oriented Patterns

Detection of Architectural Distortion in Prior Mammograms via Analysis of Oriented Patterns

Published on: August 30, 2013

Area of Science:

Statistics
Data Science
Econometrics

Background:

Outlier detection is crucial for data preprocessing, identifying non-conforming data points.
Many real-world datasets, including household expenditure data, contain structural zeros, missing values, and compositional variables, challenging traditional outlier detection methods.
Accurate outlier identification is vital for reliable estimation of economic indicators such as Purchase Power Parity.

Purpose of the Study:

To compare the performance of robust univariate and multivariate outlier detection methods on challenging datasets.
To assess the effectiveness of various methods in identifying outliers within data characterized by structural zeros, missing values, and compositional variables.
To evaluate outlier imputation strategies based on detection methods for improved indicator estimation.

Main Methods:

A complex simulation study was designed to mimic real-world data challenges, including structural zeros, missing values, and compositional variables.
Robust univariate and multivariate outlier detection techniques were applied and compared.
Performance was evaluated based on the ability to identify true outliers and influential observations, and the false discovery rate.

Main Results:

Robust multivariate outlier detection methods demonstrated superior performance compared to robust univariate methods.
The generalized S estimators (GSE), BACON-EEM algorithm, and a compositional method (CoDa-Cov) were identified as the best-performing techniques.
These top methods also excelled in outlier imputation, leading to more accurate indicator estimations.

Conclusions:

Multivariate robust methods are essential for accurate outlier detection in complex, real-world datasets.
Specific methods like GSE, BACON-EEM, and CoDa-Cov offer high effectiveness and low false discovery rates.
Integrating advanced outlier detection with imputation enhances the reliability of statistical indicators derived from challenging data.