Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Perceiving Loudness, Pitch, and Location

Perceiving Loudness, Pitch, and Location

The human brain perceives pitch through two primary mechanisms reflected in place theory and frequency theory. Each mechanism describes how sound waves are interpreted as specific pitches by the brain, offering insights into the intricate processes of auditory perception.
Place theory, or place coding, suggests that different pitches are heard because various sound waves activate specific locations along the cochlea's basilar membrane. The brain determines the pitch of a sound by...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Body burden and health risk of pharmaceuticals in elderly population: A multi-site biomonitoring study in China.

Ecotoxicology and environmental safety·2025

Same author

GLCONet: Learning Multisource Perception Representation for Camouflaged Object Detection.

IEEE transactions on neural networks and learning systems·2024

Same author

Robust Audio-Visual Contrastive Learning for Proposal-Based Self-Supervised Sound Source Localization in Videos.

IEEE transactions on pattern analysis and machine intelligence·2024

Same author

Stress and strain analysis of contractions during ramp distension in partially obstructed guinea pig jejunal segments.

Journal of biomechanics·2011

Same author

Transarticular screw and C1 hook fixation for os odontoideum with atlantoaxial dislocation.

World neurosurgery·2011

Same author

Surgical treatments of myelopathy caused by cervical ligamentum flavum ossification.

World neurosurgery·2011

Same journal

Style-Aware Contrastive Test-Time Adaptation: A Dual-Cache Model for Robust Vision-Language Alignment.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

Same journal

Semantic Frame Interpolation.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

Same journal

Physics-Guided Cross-Modal Decoupling with Test-Time Adaptation for Hyperspectral Image Restoration.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

Same journal

Change-Prior-Guided Unsupervised Change Detection of Heterogeneous Remote Sensing Images.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

Same journal

AgonicDreamer: Enhancing Multi-View Consistency in Text-to-3D Generation via Rectified Score Distillation.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

Same journal

BiCM-Prompt: Bidirectional Cross-Modal Prompt Tuning for Class-Incremental Learning on Multisource Remote Sensing Images.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Oct 21, 2025

Cross-Modal Multivariate Pattern Analysis

Cross-Modal Multivariate Pattern Analysis

Published on: November 9, 2011

Discriminative Cross-Modality Attention Network for Temporal Inconsistent Audio-Visual Event Localization.

Hanyu Xuan, Lei Luo, Zhenyu Zhang

IEEE Transactions on Image Processing : a Publication of the IEEE Signal Processing Society

|September 3, 2021

Summary

This summary is machine-generated.

This study introduces a novel network for audio-visual event localization, effectively handling temporal inconsistencies by adaptively filtering information. The approach enhances multi-modality perception for more accurate event identification.

More Related Videos

Mapping Cortical Dynamics Using Simultaneous MEG/EEG and Anatomically-constrained Minimum-norm Estimates: an Auditory Attention Example

Mapping Cortical Dynamics Using Simultaneous MEG/EEG and Anatomically-constrained Minimum-norm Estimates: an Auditory Attention Example

Published on: October 24, 2012

Measuring Attention and Visual Processing Speed by Model-based Analysis of Temporal-order Judgments

Measuring Attention and Visual Processing Speed by Model-based Analysis of Temporal-order Judgments

Published on: January 23, 2017

Related Experiment Videos

Last Updated: Oct 21, 2025

Cross-Modal Multivariate Pattern Analysis

Cross-Modal Multivariate Pattern Analysis

Published on: November 9, 2011

Mapping Cortical Dynamics Using Simultaneous MEG/EEG and Anatomically-constrained Minimum-norm Estimates: an Auditory Attention Example

Mapping Cortical Dynamics Using Simultaneous MEG/EEG and Anatomically-constrained Minimum-norm Estimates: an Auditory Attention Example

Published on: October 24, 2012

Measuring Attention and Visual Processing Speed by Model-based Analysis of Temporal-order Judgments

Measuring Attention and Visual Processing Speed by Model-based Analysis of Temporal-order Judgments

Published on: January 23, 2017

Area of Science:

Computer Vision
Artificial Intelligence
Signal Processing

Background:

Single-modality data is insufficient for comprehensive real-world semantics.
Audio-visual event localization requires matching audio and visual data for event identification.
Existing methods struggle with temporal inconsistencies in audio-visual scenes.

Purpose of the Study:

To develop a method for audio-visual event localization that overcomes temporal inconsistencies.
To simulate human multi-modality perception for adaptive information filtering.
To improve the fusion of audio and visual signals for robust event localization.

Main Methods:

Proposed a discriminative cross-modality attention network inspired by human perception.
Implemented adaptive attention mechanisms for 'where', 'when', and 'which' to attend.
Introduced a novel eigenvalue-based objective function for training and signal fusion.

Main Results:

The network adaptively selects event-relevant information, even with significant temporal inconsistencies.
Achieved improved audio-visual signal fusion, yielding discriminative and nonlinear representations.
Systematically investigated temporal, weakly-supervised spatial, and cross-modality localization subtasks.

Conclusions:

The proposed network effectively addresses temporal inconsistencies in audio-visual event localization.
The eigenvalue-based objective function enhances multi-modality representation and fusion.
The approach offers a more robust solution for complex audio-visual perception tasks.