Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Perceiving Loudness, Pitch, and Location

Perceiving Loudness, Pitch, and Location

The human brain perceives pitch through two primary mechanisms reflected in place theory and frequency theory. Each mechanism describes how sound waves are interpreted as specific pitches by the brain, offering insights into the intricate processes of auditory perception.
Place theory, or place coding, suggests that different pitches are heard because various sound waves activate specific locations along the cochlea's basilar membrane. The brain determines the pitch of a sound by...

Perception of Sound Waves

Perception of Sound Waves

The human ear is not equally sensitive to all frequencies in the audible range. It may perceive sound waves with the same pressure but different frequencies as having different loudness. Moreover, the perception of sound waves depends on the health of an individual's ears, which decays with age. The health of one's ears may also be affected by regular exposure to loud noises.
The pitch of a sound depends on the frequency and the pressure amplitude of the source. Two sounds of the same...

Difference from Background: Limit of Detection

Difference from Background: Limit of Detection

The limit of detection (LOD) is the smallest amount of analyte that can be distinguished from the background noise. The LOD value corresponds to the concentration at which the analyte signal is three times larger than the standard deviation of the blank signal. Below this value, the analyte signal cannot be differentiated from the background noise. It is calculated by dividing the calibration slope by 3 times the standard deviation of the blank signals.
The LOD indicates the presence or absence...

Classification of Signals

Classification of Signals

In signal processing, signals are classified based on various characteristics: continuous-time versus discrete-time, periodic versus aperiodic, analog versus digital, and causal versus noncausal. Each category highlights distinct properties crucial for understanding and manipulating signals.
A continuous-time signal holds a value at every instant in time, representing information seamlessly. In contrast, a discrete-time signal holds values only at specific moments, often denoted as x(n), where...

Force Classification

Force Classification

Forces play a crucial role in the study of physics and engineering. They are essential in describing the motion, behavior, and equilibrium of objects in the physical world. Forces can be classified based on their origin, type, and direction of action.
Contact and non-contact forces are two of the most widely used categories of forces. As the name suggests, contact forces require physical contact between two objects to act upon each other. Examples of contact forces include frictional,...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Improved the slow digestion property of maize starch using partially β-amylolysis.

Food chemistry·2014

Same author

Blend-modification of soy protein/lauric acid edible films using polysaccharides.

Food chemistry·2014

Same author

Structure and physicochemical properties of octenyl succinic esters of sugary maize soluble starch and waxy maize starch.

Food chemistry·2014

Same author

[Effects of left renal vein division on postoperative renal function during open repair of abdominal aortic aneurysm].

Zhonghua yi xue za zhi·2014

Same author

Association of four insulin resistance genes with type 2 diabetes mellitus and hypertension in the Chinese Han population.

Molecular biology reports·2014

Same author

Neuroprotective effect of pseudoginsenoside-f11 on a rat model of Parkinson's disease induced by 6-hydroxydopamine.

Evidence-based complementary and alternative medicine : eCAM·2014

Same journal

Relation DETR+: Exploring Explicit Position Relation Prior for Dense Prediction.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

RBF++: Quantifying and Optimizing Reasoning Boundaries across Measurable and Unmeasurable Capabilities for Chain-of-Thought Reasoning.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

CAFE: Cross-View Adaptive Fusion and Cluster Center Enhancement for Robust Multi-View Clustering.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

DIVER: Reinforced Diffusion Breaks Imitation Bottlenecks in End-to-End Autonomous Driving.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

Ethics-Aware Safe Reinforcement Learning for Rare-Event Risk Control in Interactive Urban Driving.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

Learning Shape Anchors for Holistic Indoor Scene Understanding.

IEEE transactions on pattern analysis and machine intelligence·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Jul 4, 2025

A Method to Study Adaptation to Left-Right Reversed Audition

A Method to Study Adaptation to Left-Right Reversed Audition

Published on: October 29, 2018

Robust Audio-Visual Contrastive Learning for Proposal-Based Self-Supervised Sound Source Localization in Videos.

Hanyu Xuan, Zhiliang Wu, Jian Yang

IEEE Transactions on Pattern Analysis and Machine Intelligence

|February 7, 2024

Summary

This summary is machine-generated.

This study introduces a novel proposal-based approach for semantic object-level sound source localization (SSL), improving upon existing methods. It utilizes active contrastive set mining for more robust audio-visual learning, achieving state-of-the-art results.

More Related Videos

Foreign Accent and Forensic Speaker Identification in Voice Lineups: The Influence of Acoustic Features Based on Prosody

Foreign Accent and Forensic Speaker Identification in Voice Lineups: The Influence of Acoustic Features Based on Prosody

Published on: September 27, 2024

Stereoacuity Improvement using Random-Dot Video Games

Stereoacuity Improvement using Random-Dot Video Games

Published on: January 14, 2020

Related Experiment Videos

Last Updated: Jul 4, 2025

A Method to Study Adaptation to Left-Right Reversed Audition

A Method to Study Adaptation to Left-Right Reversed Audition

Published on: October 29, 2018

Foreign Accent and Forensic Speaker Identification in Voice Lineups: The Influence of Acoustic Features Based on Prosody

Foreign Accent and Forensic Speaker Identification in Voice Lineups: The Influence of Acoustic Features Based on Prosody

Published on: September 27, 2024

Stereoacuity Improvement using Random-Dot Video Games

Stereoacuity Improvement using Random-Dot Video Games

Published on: January 14, 2020

Area of Science:

Computer Vision
Machine Learning
Signal Processing

Background:

Humans excel at sound source localization (SSL) using audio-visual cues.
Current machine methods often rely on interpolation maps, providing coarse-grained localization.
Existing self-supervised learning methods miss opportunities for large-scale data distribution reasoning.

Purpose of the Study:

To develop a novel proposal-based solution for direct, semantic object-level sound source localization without manual annotations.
To enhance audio-visual contrastive learning (AVCL) by addressing limitations in contrastive set construction.
To achieve state-of-the-art performance in sound source localization across diverse scenarios.

Main Methods:

A proposal-based framework for sound source localization (SSL).
Incorporation of Global Response Map (GRM) as an unsupervised spatial constraint.
Formulation of SSL as a Multiple Instance Learning (MIL) problem.
Development of Active Contrastive Set Mining (ACSM) to create informative negative samples for AVCL.

Main Results:

The proposed method achieves direct, semantic object-level sound source localization.
GRM effectively filters sound-unrelated regions, simplifying the SSL problem.
ACSM generates robust contrastive sets, improving AVCL.
The approach demonstrates state-of-the-art (SOTA) performance on multiple SSL datasets.

Conclusions:

The novel proposal-based approach offers a more direct and semantically meaningful solution for sound source localization.
Active Contrastive Set Mining significantly enhances the robustness of audio-visual contrastive learning.
The combined methods represent a significant advancement in audio-visual perception for machines.