Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Quantifying and Rejecting Outliers: The Grubbs Test01:02

Quantifying and Rejecting Outliers: The Grubbs Test

4.5K
Sometimes, a data set can have a recorded numerical observation that greatly  deviates from the rest of the data. Assuming that the data is normally distributed, a statistical method called the Grubbs test can be used to determine whether the observation is truly an outlier.  To perform a two-tailed Grubbs test, first, calculate the absolute difference between the outlier and the mean. Then, calculate the ratio between this difference and the standard deviation of the sample. This...
4.5K
Detection of Gross Error: The Q Test01:00

Detection of Gross Error: The Q Test

7.2K
When one or more data points appear far from the rest of the data, there is a need to determine whether they are outliers and whether they should be eliminated from the data set to ensure an accurate representation of the measured value. In many cases, outliers arise from gross errors (or human errors) and do not accurately reflect the underlying phenomenon. In some cases, however, these apparent outliers reflect true phenomenological differences. In these cases, we can use statistical methods...
7.2K
What Are Outliers?01:12

What Are Outliers?

5.6K
Outliers are observed data points that are far from the least squares line. They have unusual values and need to be examined carefully. Though an outlier may result from erroneous data, at other times, it may hold valuable information about the population under study and should be included in the data. Hence, it is crucial to examine what causes a data point to be an outlier.
The z score is used to find outliers or unusual values. It should be noted that any values beyond -2 and +2 are...
5.6K
One-Way ANOVA01:18

One-Way ANOVA

14.4K
One-way ANOVA analyzes more than three samples categorized by one factor. For example, it can compare the average mileage of sports bikes. Here, the data is categorized by one factor - the company. However, one-way ANOVA cannot be used to simultaneously compare the sample mean of three or more samples categorized by two factors. An example of two factors would be sports bikes from different companies driven in different terrains, such as a desert or snowy landscape. Here, two-way ANOVA is used...
14.4K
One-Way ANOVA: Equal Sample Sizes01:15

One-Way ANOVA: Equal Sample Sizes

4.4K
One-Way ANOVA can be performed on three or more samples with equal or unequal sample sizes. When one-way ANOVA is performed on two datasets with samples of equal sizes, it can be easily observed that the computed F statistic is highly sensitive to the sample mean.
Different sample means can result in different values for the variance estimate: variance between samples. This is because the variance between samples is calculated as the product of the sample size and the variance between the...
4.4K
Unusual Results01:16

Unusual Results

4.1K
Unusual results are those that have a very low chance of occurring. Unusual results can be identified using probabilities and the range rule of thumb. In problems involving probability, unusual results can be observed in 2 instances – an unusually high number of successes or an unusually low number of successes.
According to the range rule of thumb, any value above or below two standard deviations, 2σ  from the mean, μ  is considered unusual.
Maximum unusual value =...
4.1K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Synchronous waving in a dotillid crab Ilyoplax pusilla: behavioral analyses using a robotic model.

Journal of comparative physiology. A, Neuroethology, sensory, neural, and behavioral physiology·2025
Same author

Development and evaluation of deep learning models for detecting and classifying various bone tumours in full-field limb radiographs using automated object detection models.

Bone & joint research·2025
Same author

Identification of lineage-specific cis-trans regulatory networks related to kiwifruit ripening initiation.

The Plant journal : for cell and molecular biology·2024
Same author

Deep Bayesian active learning-to-rank with relative annotation for estimation of ulcerative colitis severity.

Medical image analysis·2024
Same author

Precise immunofluorescence canceling for highly multiplexed imaging to capture specific cell states.

Nature communications·2024
Same author

Development of an automatic surgical planning system for high tibial osteotomy using artificial intelligence.

The Knee·2024
Same journal

Thymidylate synthase inhibitory drugs induce p53-dependent pathways differently.

PloS one·2026
Same journal

Top-down and bottom-up attention for joint pattern classification and reconstruction.

PloS one·2026
Same journal

Short- and long-term scaling behavior of blood pressure and pulse arrival time during sleep in healthy controls and patients with obstructive sleep apnea.

PloS one·2026
Same journal

Double DQN-based secrecy energy efficiency and fairness performance in IRS-assisted NOMA systems with friendly jamming.

PloS one·2026
Same journal

10 recommendations for strengthening citizen science for improved societal and ecological outcomes: A co-produced analysis of challenges and opportunities in the 21st century.

PloS one·2026
Same journal

Paying in public: Peer effects, impression management, and willingness to pay on digital payment platforms.

PloS one·2026
See all related articles

Related Experiment Video

Updated: Mar 22, 2026

Basics of Multivariate Analysis in Neuroimaging Data
06:35

Basics of Multivariate Analysis in Neuroimaging Data

Published on: July 24, 2010

17.4K

A Comparative Evaluation of Unsupervised Anomaly Detection Algorithms for Multivariate Data.

Markus Goldstein1, Seiichi Uchida2

  • 1Center for Co-Evolutional Social System Innovation, Kyushu University, Fukuoka, Japan.

Plos One
|April 20, 2016
PubMed
Summary
This summary is machine-generated.

This study provides a comprehensive comparison of 19 different automated methods designed to identify unusual patterns in complex, unlabeled data across various fields like medicine and cybersecurity. By testing these tools on 10 diverse datasets, the authors clarify which approaches perform best under different conditions, offering practical guidance for researchers and practitioners.

Keywords:
outlier identificationmachine learning benchmarksdata mining evaluationcomputational efficiency

Frequently Asked Questions

More Related Videos

Machine Learning Algorithms for Early Detection of Bone Metastases in an Experimental Rat Model
07:15

Machine Learning Algorithms for Early Detection of Bone Metastases in an Experimental Rat Model

Published on: August 16, 2020

7.7K
Cross-Modal Multivariate Pattern Analysis
13:51

Cross-Modal Multivariate Pattern Analysis

Published on: November 9, 2011

20.6K

Related Experiment Videos

Last Updated: Mar 22, 2026

Basics of Multivariate Analysis in Neuroimaging Data
06:35

Basics of Multivariate Analysis in Neuroimaging Data

Published on: July 24, 2010

17.4K
Machine Learning Algorithms for Early Detection of Bone Metastases in an Experimental Rat Model
07:15

Machine Learning Algorithms for Early Detection of Bone Metastases in an Experimental Rat Model

Published on: August 16, 2020

7.7K
Cross-Modal Multivariate Pattern Analysis
13:51

Cross-Modal Multivariate Pattern Analysis

Published on: November 9, 2011

20.6K

Area of Science:

  • Computational intelligence within unsupervised anomaly detection research
  • Data science and statistical analysis frameworks

Background:

No prior work had resolved the lack of a universal comparative framework for identifying outliers in unlabeled datasets. Researchers often struggle to select appropriate tools due to the absence of standardized benchmarks. This gap motivated a systematic assessment of existing computational strategies. Prior research has shown that identifying unexpected items remains a significant hurdle in complex data analysis. That uncertainty drove the need for a rigorous, multi-domain evaluation of current methodologies. It was already known that various techniques exist, yet their relative effectiveness remained poorly understood. This study addresses the scarcity of publicly available datasets for validating detection performance. No previous investigation had provided such a broad, comparative analysis of these diverse algorithmic approaches.

Purpose Of The Study:

This study aims to provide a comprehensive comparative evaluation of 19 different unsupervised anomaly detection algorithms. The researchers seek to address the lack of a universal assessment framework in the current literature. They intend to clarify the performance of these tools across 10 diverse datasets from multiple application domains. The project addresses the urgent need for standardized benchmarks to guide practitioners in real-world settings. By publishing their source code, the team hopes to establish a new, well-funded foundation for future investigations. The authors also aim to outline the specific strengths and weaknesses of each approach for the first time. They investigate the impact of parameter settings and computational requirements on overall detection efficacy. This work is motivated by the desire to provide clear advice on algorithm selection for complex data analysis tasks.

Main Methods:

The authors adopt a systematic comparative design to evaluate 19 distinct computational approaches. They utilize 10 diverse datasets sourced from various practical application domains to ensure broad applicability. The review approach involves testing each method against standardized criteria to measure performance consistency. Researchers analyze the impact of specific parameter settings on the output of each model. They also document the computational effort required for every algorithm during the testing phase. The team investigates the distinction between local and global detection behaviors across all evaluated techniques. All source code and datasets are made publicly available to facilitate transparency and reproducibility. This methodology provides a structured way to compare disparate models on a level playing field.

Main Results:

The study reveals the specific strengths and weaknesses of 19 different approaches for the first time. Key findings from the literature indicate that performance varies significantly depending on the underlying structure of the dataset. The researchers quantify the computational effort required for each method, highlighting trade-offs between speed and accuracy. They identify how different parameter configurations impact the reliability of the detection results. The evaluation demonstrates that some algorithms excel at identifying global outliers, while others are better suited for local anomalies. This comprehensive analysis provides empirical evidence for the relative effectiveness of each tested model. The authors report that no single algorithm performs optimally across all 10 datasets. These results establish a new baseline for comparing future developments in the field.

Conclusions:

The authors provide practical guidance on selecting suitable methods for various real-world scenarios. This synthesis highlights the distinct advantages and limitations of each evaluated approach for the first time. The researchers demonstrate that performance varies significantly based on the specific characteristics of the input data. Their results emphasize the importance of considering computational requirements alongside detection accuracy. The study clarifies how parameter settings influence the reliability of these automated systems. By releasing their source code, the team establishes a stable foundation for future investigations. This work serves as a reference for practitioners navigating the complex landscape of outlier identification. The findings offer a clear path forward for improving the robustness of detection systems in diverse application domains.

The researchers propose that algorithm selection depends on balancing detection accuracy with computational efficiency. Unlike standard classification, their approach identifies outliers in unlabeled data by analyzing internal dataset structures rather than relying on predefined labels.

The authors utilize 19 distinct unsupervised anomaly detection algorithms. These tools are tested against 10 diverse datasets to ensure a broad evaluation, contrasting with previous studies that often focused on isolated or limited testing environments.

A standardized evaluation framework is necessary because the research community previously lacked common benchmarks. This technical requirement allows for a fair comparison between different methods, ensuring that strengths and weaknesses are identified consistently across various application domains.

The authors use multivariate data to assess how different models handle complex, multi-dimensional information. This data type plays a role in determining whether an algorithm effectively captures local or global patterns within the underlying structure.

The researchers measure computational effort alongside detection performance. This measurement reveals how specific models scale, providing a more comprehensive view than studies that only report accuracy metrics.

The authors suggest that their findings provide a well-funded basis for future research. They claim that this comparative analysis serves as a guide for selecting the most effective tools for practical, real-world tasks.