Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Quantifying and Rejecting Outliers: The Grubbs Test01:02

Quantifying and Rejecting Outliers: The Grubbs Test

1.9K
Sometimes, a data set can have a recorded numerical observation that greatly  deviates from the rest of the data. Assuming that the data is normally distributed, a statistical method called the Grubbs test can be used to determine whether the observation is truly an outlier.  To perform a two-tailed Grubbs test, first, calculate the absolute difference between the outlier and the mean. Then, calculate the ratio between this difference and the standard deviation of the sample. This...
1.9K
Outliers and Influential Points01:08

Outliers and Influential Points

4.2K
An outlier is an observation of data that does not fit the rest of the data. It is sometimes called an extreme value. When you graph an outlier, it will appear not to fit the pattern of the graph. Some outliers are due to mistakes (for example, writing down 50 instead of 500), while others may indicate that something unusual is happening. Outliers are present far from the least squares line in the vertical direction. They have large "errors," where the "error" or residual is the...
4.2K
Detection of Gross Error: The Q Test01:00

Detection of Gross Error: The Q Test

6.4K
When one or more data points appear far from the rest of the data, there is a need to determine whether they are outliers and whether they should be eliminated from the data set to ensure an accurate representation of the measured value. In many cases, outliers arise from gross errors (or human errors) and do not accurately reflect the underlying phenomenon. In some cases, however, these apparent outliers reflect true phenomenological differences. In these cases, we can use statistical methods...
6.4K
What Are Outliers?01:12

What Are Outliers?

4.2K
Outliers are observed data points that are far from the least squares line. They have unusual values and need to be examined carefully. Though an outlier may result from erroneous data, at other times, it may hold valuable information about the population under study and should be included in the data. Hence, it is crucial to examine what causes a data point to be an outlier.
The z score is used to find outliers or unusual values. It should be noted that any values beyond -2 and +2 are...
4.2K
Modified Boxplots00:57

Modified Boxplots

10.0K
A standard box and whisker plot informs us about the spread of the data in a given sample. One can identify the minimum value, maximum value, first quartile value, second quartile or median value, and third quartile.
However, the box plot does not tell the reader about outliers - values that lie far from the center of the data. We can modify the standard box and whisker plot to identify the outliers and visualize the actual spread of the data in a sample.
Initially, we calculate the adjusted...
10.0K
Unusual Results01:16

Unusual Results

3.3K
Unusual results are those that have a very low chance of occurring. Unusual results can be identified using probabilities and the range rule of thumb. In problems involving probability, unusual results can be observed in 2 instances – an unusually high number of successes or an unusually low number of successes.
According to the range rule of thumb, any value above or below two standard deviations, 2σ  from the mean, μ  is considered unusual.
Maximum unusual value =...
3.3K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Orthonormal pairwise logratio selection (OPALS) algorithm for compositional data analysis in high dimensions.

Bioinformatics advances·2025
Same author

The impact of misclassifications and outliers on imputation methods.

Journal of applied statistics·2024
Same author

Visual Parameter Selection for Spatial Blind Source Separation.

Computer graphics forum : journal of the European Association for Computer Graphics·2022
Same author

Privacy of Study Participants in Open-access Health and Demographic Surveillance System Data: Requirements Analysis for Data Anonymization.

JMIR public health and surveillance·2022
Same author

Robust principal component analysis for compositional tables.

Journal of applied statistics·2022
Same author

Statistical Analysis of Chemical Element Compositions in Food Science: Problems and Possibilities.

Molecules (Basel, Switzerland)·2021
Same journal

Elastic functional Cox regression model with shape predictors.

Journal of applied statistics·2026
Same journal

An improved two-stage binary relevance method for multilabel classification.

Journal of applied statistics·2026
Same journal

Classification of multivariate functional data with an application to ADHD fMRI data.

Journal of applied statistics·2026
Same journal

Assessing the performance of longitudinal T-lymphocytes as biomarkers of immune recovery in HIV-infected children with or without TB co-infection.

Journal of applied statistics·2026
Same journal

Sparse long-only Markowitz portfolio optimization.

Journal of applied statistics·2026
Same journal

Homogeneity of multinomial populations when data are classified into a large number of groups.

Journal of applied statistics·2026
See all related articles

Related Experiment Video

Updated: Sep 8, 2025

A Cross-Disciplinary and Multi-Modal Experimental Design for Studying Near-Real-Time Authentic Examination Experiences
08:33

A Cross-Disciplinary and Multi-Modal Experimental Design for Studying Near-Real-Time Authentic Examination Experiences

Published on: September 4, 2019

7.1K

Evaluation of robust outlier detection methods for zero-inflated complex data.

M Templ1,2, J Gussenbauer3, P Filzmoser2

  • 1Zurich University of Applied Sciences, Winterthur, Switzerland.

Journal of Applied Statistics
|June 16, 2022
PubMed
Summary
This summary is machine-generated.

Robust multivariate outlier detection methods effectively identify true outliers in complex datasets with zeros and compositional variables. These advanced techniques outperform univariate methods, improving data preprocessing for economic indicators like Purchase Power Parity.

Keywords:
62H86Outlier detectionhousehold expendituresrobust methodszeros

More Related Videos

An R-Based Landscape Validation of a Competing Risk Model
05:37

An R-Based Landscape Validation of a Competing Risk Model

Published on: September 16, 2022

2.2K
Detection of Architectural Distortion in Prior Mammograms via Analysis of Oriented Patterns
13:44

Detection of Architectural Distortion in Prior Mammograms via Analysis of Oriented Patterns

Published on: August 30, 2013

43.0K

Related Experiment Videos

Last Updated: Sep 8, 2025

A Cross-Disciplinary and Multi-Modal Experimental Design for Studying Near-Real-Time Authentic Examination Experiences
08:33

A Cross-Disciplinary and Multi-Modal Experimental Design for Studying Near-Real-Time Authentic Examination Experiences

Published on: September 4, 2019

7.1K
An R-Based Landscape Validation of a Competing Risk Model
05:37

An R-Based Landscape Validation of a Competing Risk Model

Published on: September 16, 2022

2.2K
Detection of Architectural Distortion in Prior Mammograms via Analysis of Oriented Patterns
13:44

Detection of Architectural Distortion in Prior Mammograms via Analysis of Oriented Patterns

Published on: August 30, 2013

43.0K

Area of Science:

  • Statistics
  • Data Science
  • Econometrics

Background:

  • Outlier detection is crucial for data preprocessing, identifying non-conforming data points.
  • Many real-world datasets, including household expenditure data, contain structural zeros, missing values, and compositional variables, challenging traditional outlier detection methods.
  • Accurate outlier identification is vital for reliable estimation of economic indicators such as Purchase Power Parity.

Purpose of the Study:

  • To compare the performance of robust univariate and multivariate outlier detection methods on challenging datasets.
  • To assess the effectiveness of various methods in identifying outliers within data characterized by structural zeros, missing values, and compositional variables.
  • To evaluate outlier imputation strategies based on detection methods for improved indicator estimation.

Main Methods:

  • A complex simulation study was designed to mimic real-world data challenges, including structural zeros, missing values, and compositional variables.
  • Robust univariate and multivariate outlier detection techniques were applied and compared.
  • Performance was evaluated based on the ability to identify true outliers and influential observations, and the false discovery rate.

Main Results:

  • Robust multivariate outlier detection methods demonstrated superior performance compared to robust univariate methods.
  • The generalized S estimators (GSE), BACON-EEM algorithm, and a compositional method (CoDa-Cov) were identified as the best-performing techniques.
  • These top methods also excelled in outlier imputation, leading to more accurate indicator estimations.

Conclusions:

  • Multivariate robust methods are essential for accurate outlier detection in complex, real-world datasets.
  • Specific methods like GSE, BACON-EEM, and CoDa-Cov offer high effectiveness and low false discovery rates.
  • Integrating advanced outlier detection with imputation enhances the reliability of statistical indicators derived from challenging data.