Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Perceiving Loudness, Pitch, and Location

Perceiving Loudness, Pitch, and Location

The human brain perceives pitch through two primary mechanisms reflected in place theory and frequency theory. Each mechanism describes how sound waves are interpreted as specific pitches by the brain, offering insights into the intricate processes of auditory perception.
Place theory, or place coding, suggests that different pitches are heard because various sound waves activate specific locations along the cochlea's basilar membrane. The brain determines the pitch of a sound by...

Parallel Processing

Parallel Processing

The brain processes sensory information rapidly due to parallel processing, which involves sending data across multiple neural pathways at the same time. This method allows the brain to manage various sensory qualities, such as shapes, colors, movements, and locations, all concurrently. For instance, when observing a forest landscape, the brain simultaneously processes the movement of leaves, the shapes of trees, the depth between them, and the various shades of green. This enables a quick and...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Admission albumin-globulin ratio associated with delayed cerebral ischemia following aneurysmal subarachnoid hemorrhage.

Frontiers in neurology·2024

Same author

Computational discovery of two-dimensional tetragonal group IV-V monolayers.

RSC advances·2024

Same author

Nickel-Catalyzed Direct Fluorosulfonylation of Vinyl Bromides and Benzyl Bromides for Sulfonyl Fluorides.

Organic letters·2024

Same author

Preoperative Prediction of Occult Level V Lymph Node Metastasis in Papillary Thyroid Carcinoma: Development and Validation of a Radiomics-Driven Nomogram Model.

Academic radiology·2024

Same author

Self-guided Knowledge-Injected Graph Neural Network for Alzheimer's Diseases.

Medical image computing and computer-assisted intervention : MICCAI ... International Conference on Medical Image Computing and Computer-Assisted Intervention·2024

Same author

Computational electron-phonon superconductivity: from theoretical physics to material science.

Journal of physics. Condensed matter : an Institute of Physics journal·2024

Same journal

Relation DETR+: Exploring Explicit Position Relation Prior for Dense Prediction.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

RBF++: Quantifying and Optimizing Reasoning Boundaries across Measurable and Unmeasurable Capabilities for Chain-of-Thought Reasoning.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

CAFE: Cross-View Adaptive Fusion and Cluster Center Enhancement for Robust Multi-View Clustering.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

DIVER: Reinforced Diffusion Breaks Imitation Bottlenecks in End-to-End Autonomous Driving.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

Ethics-Aware Safe Reinforcement Learning for Rare-Event Risk Control in Interactive Urban Driving.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

Learning Shape Anchors for Holistic Indoor Scene Understanding.

IEEE transactions on pattern analysis and machine intelligence·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Sep 12, 2025

A Methodology for Capturing Joint Visual Attention Using Mobile Eye-Trackers

A Methodology for Capturing Joint Visual Attention Using Mobile Eye-Trackers

Published on: January 18, 2020

UniAV: Unified Audio-Visual Perception for Multi-Task Video Event Localization.

Tiantian Geng, Teng Wang, Jinming Duan

IEEE Transactions on Pattern Analysis and Machine Intelligence

|August 6, 2025

Summary

This summary is machine-generated.

UniAV unifies temporal action localization, sound event detection, and audio-visual event localization for holistic video understanding. This novel framework outperforms specialized models and naive multi-task approaches across benchmarks.

More Related Videos

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

Published on: May 7, 2019

Cross-Modal Multivariate Pattern Analysis

Cross-Modal Multivariate Pattern Analysis

Published on: November 9, 2011

Related Experiment Videos

Last Updated: Sep 12, 2025

A Methodology for Capturing Joint Visual Attention Using Mobile Eye-Trackers

A Methodology for Capturing Joint Visual Attention Using Mobile Eye-Trackers

Published on: January 18, 2020

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

Published on: May 7, 2019

Cross-Modal Multivariate Pattern Analysis

Cross-Modal Multivariate Pattern Analysis

Published on: November 9, 2011

Area of Science:

Computer Vision
Machine Learning
Artificial Intelligence

Background:

Video event localization encompasses temporal action localization (TAL), sound event detection (SED), and audio-visual event localization (AVEL).
Current methods often overspecialize in individual tasks, hindering a comprehensive understanding of video content.
Existing task-specific datasets exhibit significant disparities in size, domain, and duration, complicating unified approaches.

Purpose of the Study:

To develop a unified framework for simultaneously addressing TAL, SED, and AVEL tasks.
To facilitate holistic video understanding by integrating knowledge across different event types and modalities.
To overcome the challenges posed by distinct task characteristics and dataset disparities in existing methods.

Main Methods:

Introduction of UniAV, a Unified Audio-Visual perception network.
Development of a unified audio-visual encoder for generic representations across multiple temporal scales.
Design of task-specific experts to capture unique knowledge for each task.
Implementation of a novel unified language-aware classifier with semantic-aligned task prompts for flexible, open-set localization.

Main Results:

UniAV significantly outperforms single-task models and naive multi-task baselines across all three localization tasks.
The unified architecture effectively learns and shares knowledge across tasks and modalities.
Superior or on-par performance is achieved compared to state-of-the-art task-specific methods on ActivityNet 1.3, DESED, and UnAV-100.
The model demonstrates impressive open-set localization capabilities for novel categories.

Conclusions:

UniAV offers an effective unified framework for multi-task video event localization.
The proposed architecture enhances holistic video understanding by integrating diverse event information.
UniAV represents a significant advancement in audio-visual perception and event localization research.