Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Perceiving Loudness, Pitch, and Location01:21

Perceiving Loudness, Pitch, and Location

212
The human brain perceives pitch through two primary mechanisms reflected in place theory and frequency theory. Each mechanism describes how sound waves are interpreted as specific pitches by the brain, offering insights into the intricate processes of auditory perception.
Place theory, or place coding, suggests that different pitches are heard because various sound waves activate specific locations along the cochlea's basilar membrane. The brain determines the pitch of a sound by...
212
Perception of Sound Waves01:01

Perception of Sound Waves

4.5K
The human ear is not equally sensitive to all frequencies in the audible range. It may perceive sound waves with the same pressure but different frequencies as having different loudness. Moreover, the perception of sound waves depends on the health of an individual's ears, which decays with age. The health of one's ears may also be affected by regular exposure to loud noises.
The pitch of a sound depends on the frequency and the pressure amplitude of the source. Two sounds of the same...
4.5K
Difference from Background: Limit of Detection01:05

Difference from Background: Limit of Detection

6.4K
The limit of detection (LOD) is the smallest amount of analyte that can be distinguished from the background noise. The LOD value corresponds to the concentration at which the analyte signal is three times larger than the standard deviation of the blank signal. Below this value, the analyte signal cannot be differentiated from the background noise. It is calculated by dividing the calibration slope by 3 times the standard deviation of the blank signals.
The LOD indicates the presence or absence...
6.4K
Classification of Signals01:30

Classification of Signals

462
In signal processing, signals are classified based on various characteristics: continuous-time versus discrete-time, periodic versus aperiodic, analog versus digital, and causal versus noncausal. Each category highlights distinct properties crucial for understanding and manipulating signals.
A continuous-time signal holds a value at every instant in time, representing information seamlessly. In contrast, a discrete-time signal holds values only at specific moments, often denoted as x(n), where...
462
Force Classification01:22

Force Classification

1.2K
Forces play a crucial role in the study of physics and engineering. They are essential in describing the motion, behavior, and equilibrium of objects in the physical world. Forces can be classified based on their origin, type, and direction of action.
Contact and non-contact forces are two of the most widely used categories of forces. As the name suggests, contact forces require physical contact between two objects to act upon each other. Examples of contact forces include frictional,...
1.2K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Improved the slow digestion property of maize starch using partially β-amylolysis.

Food chemistry·2014
Same author

Blend-modification of soy protein/lauric acid edible films using polysaccharides.

Food chemistry·2014
Same author

Structure and physicochemical properties of octenyl succinic esters of sugary maize soluble starch and waxy maize starch.

Food chemistry·2014
Same author

[Effects of left renal vein division on postoperative renal function during open repair of abdominal aortic aneurysm].

Zhonghua yi xue za zhi·2014
Same author

Association of four insulin resistance genes with type 2 diabetes mellitus and hypertension in the Chinese Han population.

Molecular biology reports·2014
Same author

Neuroprotective effect of pseudoginsenoside-f11 on a rat model of Parkinson's disease induced by 6-hydroxydopamine.

Evidence-based complementary and alternative medicine : eCAM·2014
Same journal

Relation DETR+: Exploring Explicit Position Relation Prior for Dense Prediction.

IEEE transactions on pattern analysis and machine intelligence·2026
Same journal

RBF++: Quantifying and Optimizing Reasoning Boundaries across Measurable and Unmeasurable Capabilities for Chain-of-Thought Reasoning.

IEEE transactions on pattern analysis and machine intelligence·2026
Same journal

CAFE: Cross-View Adaptive Fusion and Cluster Center Enhancement for Robust Multi-View Clustering.

IEEE transactions on pattern analysis and machine intelligence·2026
Same journal

DIVER: Reinforced Diffusion Breaks Imitation Bottlenecks in End-to-End Autonomous Driving.

IEEE transactions on pattern analysis and machine intelligence·2026
Same journal

Ethics-Aware Safe Reinforcement Learning for Rare-Event Risk Control in Interactive Urban Driving.

IEEE transactions on pattern analysis and machine intelligence·2026
Same journal

Learning Shape Anchors for Holistic Indoor Scene Understanding.

IEEE transactions on pattern analysis and machine intelligence·2026
See all related articles

Related Experiment Video

Updated: Jul 4, 2025

A Method to Study Adaptation to Left-Right Reversed Audition
07:14

A Method to Study Adaptation to Left-Right Reversed Audition

Published on: October 29, 2018

6.5K

Robust Audio-Visual Contrastive Learning for Proposal-Based Self-Supervised Sound Source Localization in Videos.

Hanyu Xuan, Zhiliang Wu, Jian Yang

    IEEE Transactions on Pattern Analysis and Machine Intelligence
    |February 7, 2024
    PubMed
    Summary
    This summary is machine-generated.

    This study introduces a novel proposal-based approach for semantic object-level sound source localization (SSL), improving upon existing methods. It utilizes active contrastive set mining for more robust audio-visual learning, achieving state-of-the-art results.

    More Related Videos

    Foreign Accent and Forensic Speaker Identification in Voice Lineups: The Influence of Acoustic Features Based on Prosody
    09:09

    Foreign Accent and Forensic Speaker Identification in Voice Lineups: The Influence of Acoustic Features Based on Prosody

    Published on: September 27, 2024

    457
    Stereoacuity Improvement using Random-Dot Video Games
    06:25

    Stereoacuity Improvement using Random-Dot Video Games

    Published on: January 14, 2020

    14.4K

    Related Experiment Videos

    Last Updated: Jul 4, 2025

    A Method to Study Adaptation to Left-Right Reversed Audition
    07:14

    A Method to Study Adaptation to Left-Right Reversed Audition

    Published on: October 29, 2018

    6.5K
    Foreign Accent and Forensic Speaker Identification in Voice Lineups: The Influence of Acoustic Features Based on Prosody
    09:09

    Foreign Accent and Forensic Speaker Identification in Voice Lineups: The Influence of Acoustic Features Based on Prosody

    Published on: September 27, 2024

    457
    Stereoacuity Improvement using Random-Dot Video Games
    06:25

    Stereoacuity Improvement using Random-Dot Video Games

    Published on: January 14, 2020

    14.4K

    Area of Science:

    • Computer Vision
    • Machine Learning
    • Signal Processing

    Background:

    • Humans excel at sound source localization (SSL) using audio-visual cues.
    • Current machine methods often rely on interpolation maps, providing coarse-grained localization.
    • Existing self-supervised learning methods miss opportunities for large-scale data distribution reasoning.

    Purpose of the Study:

    • To develop a novel proposal-based solution for direct, semantic object-level sound source localization without manual annotations.
    • To enhance audio-visual contrastive learning (AVCL) by addressing limitations in contrastive set construction.
    • To achieve state-of-the-art performance in sound source localization across diverse scenarios.

    Main Methods:

    • A proposal-based framework for sound source localization (SSL).
    • Incorporation of Global Response Map (GRM) as an unsupervised spatial constraint.
    • Formulation of SSL as a Multiple Instance Learning (MIL) problem.
    • Development of Active Contrastive Set Mining (ACSM) to create informative negative samples for AVCL.

    Main Results:

    • The proposed method achieves direct, semantic object-level sound source localization.
    • GRM effectively filters sound-unrelated regions, simplifying the SSL problem.
    • ACSM generates robust contrastive sets, improving AVCL.
    • The approach demonstrates state-of-the-art (SOTA) performance on multiple SSL datasets.

    Conclusions:

    • The novel proposal-based approach offers a more direct and semantically meaningful solution for sound source localization.
    • Active Contrastive Set Mining significantly enhances the robustness of audio-visual contrastive learning.
    • The combined methods represent a significant advancement in audio-visual perception for machines.