Toward Video Anomaly Retrieval From Video Anomaly Detection: New Benchmarks and Model
View abstract on PubMed
Summary
This summary is machine-generated.This study introduces Video Anomaly Retrieval (VAR) for finding specific anomalous video segments using text or audio. This approach offers more precise retrieval than traditional video anomaly detection methods.
Area Of Science
- Computer Vision
- Artificial Intelligence
- Machine Learning
Background
- Current video anomaly detection (VAD) primarily focuses on classifying events, which is insufficient for detailed anomaly characterization.
- Existing methods struggle with superficial single-label classifications and retrieving specific anomalous content from long, untrimmed videos.
Purpose Of The Study
- To introduce a novel task, Video Anomaly Retrieval (VAR), for precise retrieval of anomalous video content using cross-modal queries like text descriptions and audio.
- To address the limitations of current VAD by enabling retrieval from long, untrimmed videos that may only be partially relevant.
Main Methods
- Developed two large-scale VAR benchmarks for evaluation.
- Proposed the Anomaly-Led Alignment Network (ALAN) model, featuring anomaly-led sampling for focusing on key video segments.
- Incorporated an efficient pretext task to enhance fine-grained semantic associations between video and text representations.
- Utilized complementary alignment techniques to improve cross-modal content matching.
Main Results
- Experimental results on the developed benchmarks highlight the inherent challenges of the VAR task.
- The proposed ALAN model demonstrated significant advantages in addressing these challenges and achieving effective retrieval.
Conclusions
- Video Anomaly Retrieval (VAR) offers a more practical and detailed approach to identifying anomalous video content compared to traditional methods.
- The ALAN model and VAR benchmarks provide a strong foundation for future research in cross-modal video retrieval and anomaly detection.

