Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Outliers and Influential Points01:08

Outliers and Influential Points

4.8K
An outlier is an observation of data that does not fit the rest of the data. It is sometimes called an extreme value. When you graph an outlier, it will appear not to fit the pattern of the graph. Some outliers are due to mistakes (for example, writing down 50 instead of 500), while others may indicate that something unusual is happening. Outliers are present far from the least squares line in the vertical direction. They have large "errors," where the "error" or residual is the...
4.8K
Quantifying and Rejecting Outliers: The Grubbs Test01:02

Quantifying and Rejecting Outliers: The Grubbs Test

2.6K
Sometimes, a data set can have a recorded numerical observation that greatly  deviates from the rest of the data. Assuming that the data is normally distributed, a statistical method called the Grubbs test can be used to determine whether the observation is truly an outlier.  To perform a two-tailed Grubbs test, first, calculate the absolute difference between the outlier and the mean. Then, calculate the ratio between this difference and the standard deviation of the sample. This...
2.6K
What Are Outliers?01:12

What Are Outliers?

4.5K
Outliers are observed data points that are far from the least squares line. They have unusual values and need to be examined carefully. Though an outlier may result from erroneous data, at other times, it may hold valuable information about the population under study and should be included in the data. Hence, it is crucial to examine what causes a data point to be an outlier.
The z score is used to find outliers or unusual values. It should be noted that any values beyond -2 and +2 are...
4.5K
Detection of Gross Error: The Q Test01:00

Detection of Gross Error: The Q Test

6.5K
When one or more data points appear far from the rest of the data, there is a need to determine whether they are outliers and whether they should be eliminated from the data set to ensure an accurate representation of the measured value. In many cases, outliers arise from gross errors (or human errors) and do not accurately reflect the underlying phenomenon. In some cases, however, these apparent outliers reflect true phenomenological differences. In these cases, we can use statistical methods...
6.5K
Classification of Signals01:30

Classification of Signals

1.0K
In signal processing, signals are classified based on various characteristics: continuous-time versus discrete-time, periodic versus aperiodic, analog versus digital, and causal versus noncausal. Each category highlights distinct properties crucial for understanding and manipulating signals.
A continuous-time signal holds a value at every instant in time, representing information seamlessly. In contrast, a discrete-time signal holds values only at specific moments, often denoted as x(n), where...
1.0K
Classification of Systems-I01:26

Classification of Systems-I

357
Linearity is a system property characterized by a direct input-output relationship, combining homogeneity and additivity.
Homogeneity dictates that if an input x(t) is multiplied by a constant c, the output y(t) is multiplied by the same constant. Mathematically, this is expressed as:
357

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same journal

Big Data-Driven Video Anomaly Detection Using VideoMAE for Visual Analytics in CCTV Surveillance.

Big data·2026
Same journal

Agentic Artificial Intelligence-Driven Explainable Deep Learning for Deciphering Noncoding Pathogenic Mechanisms of Delirium Through Genomic Big Data Integration.

Big data·2026
Same journal

Personalized Driven Instruction Through Explainable Agentic AI in Multicultural Higher Education Environments.

Big data·2026
Same journal

Big Data-Driven Explainable Agentic AI Decision Frameworks for Enterprise Innovation in FinTech Ecosystems.

Big data·2026
Same journal

An Edge-Enabled Low-Latency Cross-Lingual Speech-to-Text Framework for Efficient Human-Robot Interaction.

Big data·2026
Same journal

DS<sup>2</sup>PT: A Deep Two-Stage Patent Text Segmentation Framework Informed by Low-Latency Neural Network Characteristics.

Big data·2026
See all related articles

Related Experiment Video

Updated: Oct 11, 2025

Large-scale Reconstructions and Independent, Unbiased Clustering Based on Morphological Metrics to Classify Neurons in Selective Populations
12:27

Large-scale Reconstructions and Independent, Unbiased Clustering Based on Morphological Metrics to Classify Neurons in Selective Populations

Published on: February 15, 2017

7.1K

On Using Classification Datasets to Evaluate Graph Outlier Detection: Peculiar Observations and New Insights.

Lingxiao Zhao1, Leman Akoglu1

  • 1Heinz College Information Systems & Public Policy, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA.

Big Data
|December 6, 2021
PubMed
Summary
This summary is machine-generated.

Repurposing graph classification datasets for graph-level outlier detection (GLOD) causes performance flips. Model performance drastically changes based on which class is down-sampled, highlighting issues with current evaluation methods.

Keywords:
classification datasetsgraph propagationoutlier evaluation

More Related Videos

Statistical Modelling of Cortical Connectivity Using Non-invasive Electroencephalograms
08:51

Statistical Modelling of Cortical Connectivity Using Non-invasive Electroencephalograms

Published on: November 1, 2019

5.8K
Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems
05:47

Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems

Published on: June 13, 2025

696

Related Experiment Videos

Last Updated: Oct 11, 2025

Large-scale Reconstructions and Independent, Unbiased Clustering Based on Morphological Metrics to Classify Neurons in Selective Populations
12:27

Large-scale Reconstructions and Independent, Unbiased Clustering Based on Morphological Metrics to Classify Neurons in Selective Populations

Published on: February 15, 2017

7.1K
Statistical Modelling of Cortical Connectivity Using Non-invasive Electroencephalograms
08:51

Statistical Modelling of Cortical Connectivity Using Non-invasive Electroencephalograms

Published on: November 1, 2019

5.8K
Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems
05:47

Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems

Published on: June 13, 2025

696

Area of Science:

  • Data Mining
  • Machine Learning
  • Graph Analytics

Background:

  • Outlier detection models are often evaluated using repurposed classification datasets.
  • Graph-level outlier detection (GLOD) is an understudied area with significant real-world potential.
  • Current practices repurpose graph classification datasets for GLOD by down-sampling one class to create outlier samples.

Purpose of the Study:

  • To identify and analyze an issue with repurposing graph classification datasets for GLOD.
  • To investigate the causes of performance variations in GLOD models.
  • To question the appropriateness of current GLOD evaluation methodologies.

Main Methods:

  • Investigated the impact of down-sampling different classes on ROC-AUC performance for GLOD.
  • Analyzed graph embedding spaces generated by propagation-based models.
  • Examined various graph embedding methods and downstream outlier detectors.

Main Results:

  • A significant "performance flip" was observed, where model performance drastically changes (from high to worse-than-random) depending on the down-sampled class.
  • Performance gaps were amplified by propagation in certain models.
  • Disparity in within-class densities and overlapping class supports in embedding spaces were identified as key factors.
  • The performance flip issue persists across different embedding methods, though the specific down-sampled version yielding higher performance may vary.

Conclusions:

  • The identified performance flip issue raises concerns about the validity of averaging GLOD performance across different down-sampled dataset versions.
  • There is a need to develop improved graph embedding methods to mitigate the observed performance flip.
  • Further research is required to establish robust evaluation protocols for GLOD.