Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Detection of Gross Error: The Q Test

Detection of Gross Error: The Q Test

When one or more data points appear far from the rest of the data, there is a need to determine whether they are outliers and whether they should be eliminated from the data set to ensure an accurate representation of the measured value. In many cases, outliers arise from gross errors (or human errors) and do not accurately reflect the underlying phenomenon. In some cases, however, these apparent outliers reflect true phenomenological differences. In these cases, we can use statistical methods...

What Are Outliers?

What Are Outliers?

Outliers are observed data points that are far from the least squares line. They have unusual values and need to be examined carefully. Though an outlier may result from erroneous data, at other times, it may hold valuable information about the population under study and should be included in the data. Hence, it is crucial to examine what causes a data point to be an outlier.
The z score is used to find outliers or unusual values. It should be noted that any values beyond -2 and +2 are...

Quantifying and Rejecting Outliers: The Grubbs Test

Quantifying and Rejecting Outliers: The Grubbs Test

Sometimes, a data set can have a recorded numerical observation that greatly deviates from the rest of the data. Assuming that the data is normally distributed, a statistical method called the Grubbs test can be used to determine whether the observation is truly an outlier. To perform a two-tailed Grubbs test, first, calculate the absolute difference between the outlier and the mean. Then, calculate the ratio between this difference and the standard deviation of the sample. This...

Unusual Results

Unusual Results

Unusual results are those that have a very low chance of occurring. Unusual results can be identified using probabilities and the range rule of thumb. In problems involving probability, unusual results can be observed in 2 instances – an unusually high number of successes or an unusually low number of successes.
According to the range rule of thumb, any value above or below two standard deviations, 2σ from the mean, μ is considered unusual.
Maximum unusual value =...

Difference from Background: Limit of Detection

Difference from Background: Limit of Detection

The limit of detection (LOD) is the smallest amount of analyte that can be distinguished from the background noise. The LOD value corresponds to the concentration at which the analyte signal is three times larger than the standard deviation of the blank signal. Below this value, the analyte signal cannot be differentiated from the background noise. It is calculated by dividing the calibration slope by 3 times the standard deviation of the blank signals.
The LOD indicates the presence or absence...

The Anderson-Darling Test

The Anderson-Darling Test

The Anderson-Darling test is a statistical method used to determine whether a data sample is likely drawn from a specific theoretical distribution. Unlike parametric tests, it does not require assumptions about specific parameters of the distribution. Instead, it compares the sample's empirical cumulative distribution function (ECDF) with the cumulative distribution function (CDF) of the hypothesized distribution. Critical values for the test are specific to the chosen distribution rather...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

SUPERVISED LEARNING OF OUTCOME-RELEVANT ITEMS FROM A QUESTIONNAIRE VIA MIXED INTEGER OPTIMIZATION.

The annals of applied statistics·2026

Same author

Community-level wastewater surveillance with machine learning methods to assess underreporting of COVID-19 case counts.

mLife·2026

Same author

Collaborative Inference for Accelerated Failure Time Model Using Clinical Center-Level Summary Statistics.

Statistics in medicine·2025

Same author

Determinants of enrolment rate in 397 clinical trials for healing diabetic foot ulcers: a systematic review.

BMJ open·2025

Same author

DrFARM: identification of pleiotropic genetic variants in genome-wide association studies.

Nature communications·2025

Same author

Limitation of site-stratified cox regression analysis in survival data: a cautionary tale of the PANAMO phase III randomized, controlled study in critically ill COVID-19 patients.

Trials·2024

Same journal

Research on a Regional Availability Evaluation Model for Road-Area High-Entropy Energy Based on Synergy Factors.

Entropy (Basel, Switzerland)·2026

Same journal

Atmospheric Turbulence Channel Modeling and Performance Analysis of a CO-ZP-OFDM Coherent Optical Communication System for UAV Air-to-Ground Scenarios.

Entropy (Basel, Switzerland)·2026

Same journal

Information Geometry and Asymptotic Theory for SMML Estimators.

Entropy (Basel, Switzerland)·2026

Same journal

Correlation Entropy and Power-Law Kinetics.

Entropy (Basel, Switzerland)·2026

Same journal

Research on the Contagion of Systemic Financial Risk Under the Impact of Climate Risks-From the Perspective of Complex Networks and Machine Learning.

Entropy (Basel, Switzerland)·2026

Same journal

The Statistical-Mechanical Meaning of the Wave Function of Quantum Mechanics.

Entropy (Basel, Switzerland)·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Aug 30, 2025

Cross-Modal Multivariate Pattern Analysis

Cross-Modal Multivariate Pattern Analysis

Published on: November 9, 2011

A Pattern Dictionary Method for Anomaly Detection.

Elyas Sabeti¹, Sehong Oh², Peter X K Song³

¹Michigan Institute for Data Science, University of Michigan, Ann Arbor, MI 48109, USA.

Entropy (Basel, Switzerland)

|August 26, 2022

Summary

This summary is machine-generated.

This study introduces a novel compression-based anomaly detection method using a pattern dictionary for time series and sequence data. This approach effectively identifies unusual patterns by measuring data complexity, enhancing anomaly detection capabilities.

Keywords:

Lempel–Ziv algorithm anomaly detection atypicality lossless compression pattern dictionary

More Related Videos

A Semantic Priming Event-related Potential ERP Task to Study Lexico-semantic and Visuo-semantic Processing in Autism Spectrum Disorder

A Semantic Priming Event-related Potential ERP Task to Study Lexico-semantic and Visuo-semantic Processing in Autism Spectrum Disorder

Published on: April 12, 2018

Identification of Disease-related Spatial Covariance Patterns using Neuroimaging Data

Identification of Disease-related Spatial Covariance Patterns using Neuroimaging Data

Published on: June 26, 2013

Related Experiment Videos

Last Updated: Aug 30, 2025

Cross-Modal Multivariate Pattern Analysis

Cross-Modal Multivariate Pattern Analysis

Published on: November 9, 2011

A Semantic Priming Event-related Potential ERP Task to Study Lexico-semantic and Visuo-semantic Processing in Autism Spectrum Disorder

A Semantic Priming Event-related Potential ERP Task to Study Lexico-semantic and Visuo-semantic Processing in Autism Spectrum Disorder

Published on: April 12, 2018

Identification of Disease-related Spatial Covariance Patterns using Neuroimaging Data

Identification of Disease-related Spatial Covariance Patterns using Neuroimaging Data

Published on: June 26, 2013

Area of Science:

Data Science
Machine Learning
Signal Processing

Background:

Anomaly detection is crucial for identifying unusual patterns in time series and sequence data.
Existing methods may struggle with complex patterns and require robust baselines for accurate detection.

Purpose of the Study:

To propose a compression-based anomaly detection method using a pattern dictionary.
To develop a robust system for identifying anomalous patterns in sequential data.
To establish a framework for creating health baselines for anomaly detection.

Main Methods:

Utilizing a pattern dictionary to learn complex patterns in training data.
Employing sequence complexity measures (parsed phrases, codelength) as anomaly scores.
Combining the pattern dictionary with universal source coders for atypicality detection.
Deriving a non-asymptotic upper bound for LZ78 parser using the Lambert W function.

Main Results:

The pattern dictionary method effectively detects anomalies by assessing sequence complexity.
Combining with universal source coders creates a powerful atypicality detector.
A novel non-asymptotic bound for LZ78 was derived, defining the anomaly score range.
The framework was illustrated for establishing health baselines against deviations.

Conclusions:

The proposed pattern dictionary method offers a powerful and flexible approach to anomaly detection in sequential data.
The method provides a quantitative anomaly score and can be enhanced with universal source coders.
The derived theoretical bound contributes to understanding the method's performance limits.