Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Perceiving Loudness, Pitch, and Location01:21

Perceiving Loudness, Pitch, and Location

460
The human brain perceives pitch through two primary mechanisms reflected in place theory and frequency theory. Each mechanism describes how sound waves are interpreted as specific pitches by the brain, offering insights into the intricate processes of auditory perception.
Place theory, or place coding, suggests that different pitches are heard because various sound waves activate specific locations along the cochlea's basilar membrane. The brain determines the pitch of a sound by...
460
Aliasing01:18

Aliasing

267
Accurate signal sampling and reconstruction are crucial in various signal-processing applications. A time-domain signal's spectrum can be revealed using its Fourier transform. When this signal is sampled at a specific frequency, it results in multiple scaled replicas of the original spectrum in the frequency domain. The spacing of these replicas is determined by the sampling frequency.
If the sampling frequency is below the Nyquist rate, these replicas overlap, preventing the original...
267

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Understanding the density maximum of water with machine-learned potentials.

Science advances·2026
Same author

PCSK9 orchestrates the antigen presentation-endothelial barrier axis to potentiate immune exclusion in colorectal cancer.

Inflammation research : official journal of the European Histamine Research Society ... [et al.]·2026
Same author

Minimal twin structures enabling extraordinary thermoelectric power factor of n-type Bi<sub>2</sub>Te<sub>3</sub> thin films.

Nature communications·2026
Same author

Ordered Ba<sub>2</sub>EuIrO<sub>6</sub> Double Perovskite With Active Ir─O<sub>bri</sub>─Eu Unit for Enhanced Electrocatalytic Oxygen Evolution in PEMWE.

Angewandte Chemie (International ed. in English)·2026
Same author

Association between cumulative changes of the C-reactive protein-triglyceride glucose index and the incidence of rapid kidney function decline: a nationwide prospective cohort study.

Frontiers in nutrition·2026
Same author

Development and internal validation of a nomogram to predict in-hospital mortality in patients with cirrhosis and acute kidney injury receiving continuous renal replacement therapy: A retrospective cohort study.

The Journal of international medical research·2026
Same journal

RETRACTED: Zhang et al. A Novel Framework for Reconstruction and Imaging of Target Scattering Centers via Wide-Angle Incidence in Radar Networks. <i>Sensors</i> 2025, <i>25</i>, 6802.

Sensors (Basel, Switzerland)·2026
Same journal

Enhancing Unsupervised Multi-Source Domain Adaptation for Person Re-Identification via Mixture of Experts and Graph-Based Relation.

Sensors (Basel, Switzerland)·2026
Same journal

Development of an Instrumented Glove for Palmar Pressure Assessment in Kayakers.

Sensors (Basel, Switzerland)·2026
Same journal

Development and Experimental Validation of an Autonomous IoT-Based Monitoring System for Real-Time Water Quality Assessment in the Amazon River.

Sensors (Basel, Switzerland)·2026
Same journal

Semi-Supervised Adversarial Learning Framework for Controller Area Network Bus Intrusion Detection.

Sensors (Basel, Switzerland)·2026
Same journal

Smart Optimization Method for Safety Signs in Innovative Manufacturing Environments Integrating Industrial Field IoT Sensors and Knowledge Graphs.

Sensors (Basel, Switzerland)·2026
See all related articles

Related Experiment Video

Updated: Sep 29, 2025

Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception
05:48

Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception

Published on: August 9, 2024

1.7K

Attention-Based Temporal-Frequency Aggregation for Speaker Verification.

Meng Wang1, Dazheng Feng1, Tingting Su1

  • 1National Laboratory of Radar Signal Processing, Xidian University, Xi’an 710071, China.

Sensors (Basel, Switzerland)
|March 26, 2022
PubMed
Summary
This summary is machine-generated.

This study introduces novel temporal-frequency aggregation methods for speaker verification (SV) systems using convolutional neural networks (CNNs). These methods enhance speaker embedding discriminability by capturing both time and frequency domain information, achieving superior results.

Keywords:
convolutional neural networksself-attentionspeaker verificationtemporal-frequency aggregation

More Related Videos

Foreign Accent and Forensic Speaker Identification in Voice Lineups: The Influence of Acoustic Features Based on Prosody
09:09

Foreign Accent and Forensic Speaker Identification in Voice Lineups: The Influence of Acoustic Features Based on Prosody

Published on: September 27, 2024

554
Targeted Training of Ultrasonic Vocalizations in Aged and Parkinsonian Rats
11:00

Targeted Training of Ultrasonic Vocalizations in Aged and Parkinsonian Rats

Published on: August 8, 2011

19.9K

Related Experiment Videos

Last Updated: Sep 29, 2025

Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception
05:48

Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception

Published on: August 9, 2024

1.7K
Foreign Accent and Forensic Speaker Identification in Voice Lineups: The Influence of Acoustic Features Based on Prosody
09:09

Foreign Accent and Forensic Speaker Identification in Voice Lineups: The Influence of Acoustic Features Based on Prosody

Published on: September 27, 2024

554
Targeted Training of Ultrasonic Vocalizations in Aged and Parkinsonian Rats
11:00

Targeted Training of Ultrasonic Vocalizations in Aged and Parkinsonian Rats

Published on: August 8, 2011

19.9K

Area of Science:

  • Speech processing
  • Machine learning
  • Biometrics

Background:

  • Convolutional neural networks (CNNs) are pivotal in speaker verification (SV) due to their feature learning capabilities.
  • Utterance-level aggregation in CNN-based SV compresses frame-level features but often overlooks frequency domain information.
  • Existing methods primarily aggregate features temporally, limiting the capture of speaker-specific frequency characteristics.

Purpose of the Study:

  • To address the limitations of temporal aggregation in CNN-based SV systems.
  • To propose novel attention-based frequency and temporal-frequency aggregation methods.
  • To enhance the discriminability of speaker embeddings by incorporating frequency domain information.

Main Methods:

  • Development of an attention-based frequency aggregation method to identify informative frequency bands.
  • Introduction of two novel temporal-frequency aggregation methods combining time and frequency domain analysis.
  • Implementation and evaluation of a CNN-based SV system utilizing the proposed aggregation techniques.

Main Results:

  • The proposed temporal-frequency aggregation method significantly improves speaker embedding discriminability.
  • The CNN-based SV system achieved a superior equal error rate (EER) of 5.96% on the Voxceleb dataset.
  • Experimental validation on TIMIT and Voxceleb datasets demonstrates the effectiveness of the proposed methods over state-of-the-art baselines.

Conclusions:

  • The novel temporal-frequency aggregation methods effectively capture speaker-dependent information from both time and frequency domains.
  • The proposed approach enhances the performance of CNN-based speaker verification systems.
  • This work offers a promising direction for improving the accuracy and robustness of speaker recognition technologies.