Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Perceiving Loudness, Pitch, and Location

Perceiving Loudness, Pitch, and Location

The human brain perceives pitch through two primary mechanisms reflected in place theory and frequency theory. Each mechanism describes how sound waves are interpreted as specific pitches by the brain, offering insights into the intricate processes of auditory perception.
Place theory, or place coding, suggests that different pitches are heard because various sound waves activate specific locations along the cochlea's basilar membrane. The brain determines the pitch of a sound by...

Aliasing

Aliasing

Accurate signal sampling and reconstruction are crucial in various signal-processing applications. A time-domain signal's spectrum can be revealed using its Fourier transform. When this signal is sampled at a specific frequency, it results in multiple scaled replicas of the original spectrum in the frequency domain. The spacing of these replicas is determined by the sampling frequency.
If the sampling frequency is below the Nyquist rate, these replicas overlap, preventing the original...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Understanding the density maximum of water with machine-learned potentials.

Science advances·2026

Same author

PCSK9 orchestrates the antigen presentation-endothelial barrier axis to potentiate immune exclusion in colorectal cancer.

Inflammation research : official journal of the European Histamine Research Society ... [et al.]·2026

Same author

Minimal twin structures enabling extraordinary thermoelectric power factor of n-type Bi<sub>2</sub>Te<sub>3</sub> thin films.

Nature communications·2026

Same author

Ordered Ba<sub>2</sub>EuIrO<sub>6</sub> Double Perovskite With Active Ir─O<sub>bri</sub>─Eu Unit for Enhanced Electrocatalytic Oxygen Evolution in PEMWE.

Angewandte Chemie (International ed. in English)·2026

Same author

Association between cumulative changes of the C-reactive protein-triglyceride glucose index and the incidence of rapid kidney function decline: a nationwide prospective cohort study.

Frontiers in nutrition·2026

Same author

Development and internal validation of a nomogram to predict in-hospital mortality in patients with cirrhosis and acute kidney injury receiving continuous renal replacement therapy: A retrospective cohort study.

The Journal of international medical research·2026

Same journal

RETRACTED: Zhang et al. A Novel Framework for Reconstruction and Imaging of Target Scattering Centers via Wide-Angle Incidence in Radar Networks. <i>Sensors</i> 2025, <i>25</i>, 6802.

Sensors (Basel, Switzerland)·2026

Same journal

Enhancing Unsupervised Multi-Source Domain Adaptation for Person Re-Identification via Mixture of Experts and Graph-Based Relation.

Sensors (Basel, Switzerland)·2026

Same journal

Development of an Instrumented Glove for Palmar Pressure Assessment in Kayakers.

Sensors (Basel, Switzerland)·2026

Same journal

Development and Experimental Validation of an Autonomous IoT-Based Monitoring System for Real-Time Water Quality Assessment in the Amazon River.

Sensors (Basel, Switzerland)·2026

Same journal

Semi-Supervised Adversarial Learning Framework for Controller Area Network Bus Intrusion Detection.

Sensors (Basel, Switzerland)·2026

Same journal

Smart Optimization Method for Safety Signs in Innovative Manufacturing Environments Integrating Industrial Field IoT Sensors and Knowledge Graphs.

Sensors (Basel, Switzerland)·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Sep 29, 2025

Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception

Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception

Published on: August 9, 2024

Attention-Based Temporal-Frequency Aggregation for Speaker Verification.

Meng Wang¹, Dazheng Feng¹, Tingting Su¹

¹National Laboratory of Radar Signal Processing, Xidian University, Xi’an 710071, China.

Sensors (Basel, Switzerland)

|March 26, 2022

Summary

This summary is machine-generated.

This study introduces novel temporal-frequency aggregation methods for speaker verification (SV) systems using convolutional neural networks (CNNs). These methods enhance speaker embedding discriminability by capturing both time and frequency domain information, achieving superior results.

Keywords:

convolutional neural networks self-attention speaker verification temporal-frequency aggregation

More Related Videos

Foreign Accent and Forensic Speaker Identification in Voice Lineups: The Influence of Acoustic Features Based on Prosody

Foreign Accent and Forensic Speaker Identification in Voice Lineups: The Influence of Acoustic Features Based on Prosody

Published on: September 27, 2024

Targeted Training of Ultrasonic Vocalizations in Aged and Parkinsonian Rats

Targeted Training of Ultrasonic Vocalizations in Aged and Parkinsonian Rats

Published on: August 8, 2011

Related Experiment Videos

Last Updated: Sep 29, 2025

Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception

Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception

Published on: August 9, 2024

Foreign Accent and Forensic Speaker Identification in Voice Lineups: The Influence of Acoustic Features Based on Prosody

Foreign Accent and Forensic Speaker Identification in Voice Lineups: The Influence of Acoustic Features Based on Prosody

Published on: September 27, 2024

Targeted Training of Ultrasonic Vocalizations in Aged and Parkinsonian Rats

Targeted Training of Ultrasonic Vocalizations in Aged and Parkinsonian Rats

Published on: August 8, 2011

Area of Science:

Speech processing
Machine learning
Biometrics

Background:

Convolutional neural networks (CNNs) are pivotal in speaker verification (SV) due to their feature learning capabilities.
Utterance-level aggregation in CNN-based SV compresses frame-level features but often overlooks frequency domain information.
Existing methods primarily aggregate features temporally, limiting the capture of speaker-specific frequency characteristics.

Purpose of the Study:

To address the limitations of temporal aggregation in CNN-based SV systems.
To propose novel attention-based frequency and temporal-frequency aggregation methods.
To enhance the discriminability of speaker embeddings by incorporating frequency domain information.

Main Methods:

Development of an attention-based frequency aggregation method to identify informative frequency bands.
Introduction of two novel temporal-frequency aggregation methods combining time and frequency domain analysis.
Implementation and evaluation of a CNN-based SV system utilizing the proposed aggregation techniques.

Main Results:

The proposed temporal-frequency aggregation method significantly improves speaker embedding discriminability.
The CNN-based SV system achieved a superior equal error rate (EER) of 5.96% on the Voxceleb dataset.
Experimental validation on TIMIT and Voxceleb datasets demonstrates the effectiveness of the proposed methods over state-of-the-art baselines.

Conclusions:

The novel temporal-frequency aggregation methods effectively capture speaker-dependent information from both time and frequency domains.
The proposed approach enhances the performance of CNN-based speaker verification systems.
This work offers a promising direction for improving the accuracy and robustness of speaker recognition technologies.