Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Perceiving Loudness, Pitch, and Location01:21

Perceiving Loudness, Pitch, and Location

546
The human brain perceives pitch through two primary mechanisms reflected in place theory and frequency theory. Each mechanism describes how sound waves are interpreted as specific pitches by the brain, offering insights into the intricate processes of auditory perception.
Place theory, or place coding, suggests that different pitches are heard because various sound waves activate specific locations along the cochlea's basilar membrane. The brain determines the pitch of a sound by...
546
Chunking and Rehearsal in Sensory Memory01:22

Chunking and Rehearsal in Sensory Memory

337
Improving short-term memory can be achieved through techniques like chunking and rehearsal. Chunking involves organizing information into larger, more manageable units. This technique is particularly useful for information that exceeds the typical memory span of between five and nine items. For instance, logging into an online account with a password like "ta89vq0179gz" involves grouping letters and numbers into three chunks—ta89, vq01, and 79gz. It makes large amounts of...
337
Auditory Pathway01:15

Auditory Pathway

6.0K
Auditory pathways constitute the complex neural circuits responsible for transmitting and interpreting auditory information from the peripheral auditory system to the brain. Sound waves are initially captured by the outer ear, funneled through the ear canal, and reach the tympanic membrane (eardrum). These vibrations are transmitted via the middle ear's ossicles to the inner ear's cochlea.
When viewed cross-sectionally, the cochlea reveals the scala vestibuli and scala tympani flanking...
6.0K
Auditory Perception01:17

Auditory Perception

665
The auditory system is essential for sound perception, utilizing various critical structures. When sound waves enter the outer ear, they travel through the ear canal and cause the eardrum to vibrate. These vibrations are then transmitted to the middle ear, where three tiny bones – the malleus, incus, and stapes – amplify the sound. This amplification is crucial, as it ensures that the sound vibrations are strong enough to be conveyed to the inner ear. These vibrations then reach the...
665
Stereoisomers02:32

Stereoisomers

15.8K
On the basis of mirror symmetry, stereoisomers of an organic molecule can be further classified into diastereomers and enantiomers. Diastereomers are stereoisomers that are not mirror images of each other. Substituted alkenes, such as the cis and trans isomers of 2-butene, are diastereomers, as these molecules exhibit different spatial orientations of their constituent atoms, are not mirror images of each other, and do not interconvert. Here, the interconversion is suppressed due to...
15.8K
Hearing01:31

Hearing

54.1K
When we hear a sound, our nervous system is detecting sound waves—pressure waves of mechanical energy traveling through a medium. The frequency of the wave is perceived as pitch, while the amplitude is perceived as loudness.
54.1K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Using covariance of node states to design early warning signals for network dynamics.

Philosophical transactions. Series A, Mathematical, physical, and engineering sciences·2026
Same author

Observing network dynamics through sentinel nodes.

Nature communications·2025
Same author

Applicability of spatial early warning signals to complex network dynamics.

Journal of the Royal Society, Interface·2025
Same author

Swarm systems as a platform for open-ended evolutionary dynamics.

Philosophical transactions. Series A, Mathematical, physical, and engineering sciences·2025
Same author

COVID-19 vaccine messaging for young adults: Examining framing, other-referencing, and health beliefs.

Health psychology : official journal of the Division of Health Psychology, American Psychological Association·2024
Same author

Anticipating regime shifts by mixing early warning signals from different nodes.

Nature communications·2024
Same journal

RETRACTION: Multidimensional Heterogeneous Network Link Adaptation Based on Mobile Environment.

Computational intelligence and neuroscience·2026
Same journal

RETRACTION: Framework to Segment and Evaluate Multiple Sclerosis Lesion in MRI Slices Using VGG-UNet.

Computational intelligence and neuroscience·2026
Same journal

RETRACTION: Facial Emotion Recognition Using a Novel Fusion of Convolutional Neural Network and Local Binary Pattern in Crime Investigation.

Computational intelligence and neuroscience·2026
Same journal

RETRACTION: Automatic Intelligent System Using Medical of Things for Multiple Sclerosis Detection.

Computational intelligence and neuroscience·2026
Same journal

RETRACTION: Intangible Cultural Heritage Reproduction and Revitalization: Value Feedback, Practice, and Exploration Based on the IPA Model.

Computational intelligence and neuroscience·2026
Same journal

RETRACTION: CNN Based Multiclass Brain Tumor Detection Using Medical Imaging.

Computational intelligence and neuroscience·2025
See all related articles

Related Experiment Video

Updated: Oct 17, 2025

Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception
05:48

Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception

Published on: August 9, 2024

1.7K

Utterance Clustering Using Stereo Audio Channels.

Yingjun Dong1,2, Neil G MacLaren1,3, Yiding Cao1,2

  • 1Center for Collective Dynamics of Complex Systems, Binghamton University, State University of New York, Binghamton, NY 13902-6000, USA.

Computational Intelligence and Neuroscience
|October 7, 2021
PubMed
Summary
This summary is machine-generated.

This study enhances utterance clustering using multichannel audio signals, improving speaker identification accuracy. The novel approach outperforms traditional mono-audio methods in complex discussions.

More Related Videos

Foreign Accent and Forensic Speaker Identification in Voice Lineups: The Influence of Acoustic Features Based on Prosody
09:09

Foreign Accent and Forensic Speaker Identification in Voice Lineups: The Influence of Acoustic Features Based on Prosody

Published on: September 27, 2024

589
Sound Source Localization Testing in Single-sided Deafness Following Bone Conduction Intervention
04:32

Sound Source Localization Testing in Single-sided Deafness Following Bone Conduction Intervention

Published on: December 20, 2024

522

Related Experiment Videos

Last Updated: Oct 17, 2025

Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception
05:48

Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception

Published on: August 9, 2024

1.7K
Foreign Accent and Forensic Speaker Identification in Voice Lineups: The Influence of Acoustic Features Based on Prosody
09:09

Foreign Accent and Forensic Speaker Identification in Voice Lineups: The Influence of Acoustic Features Based on Prosody

Published on: September 27, 2024

589
Sound Source Localization Testing in Single-sided Deafness Following Bone Conduction Intervention
04:32

Sound Source Localization Testing in Single-sided Deafness Following Bone Conduction Intervention

Published on: December 20, 2024

522

Area of Science:

  • Audio signal processing
  • Machine learning
  • Speech recognition

Background:

  • Utterance clustering is crucial for separating speakers in audio.
  • Current methods often rely on single-channel (mono) audio signals.
  • Improving clustering performance in complex acoustic environments remains a challenge.

Purpose of the Study:

  • To enhance utterance clustering performance by utilizing multichannel (stereo) audio signals.
  • To investigate novel methods for processing stereo audio for improved feature extraction.
  • To evaluate the effectiveness of the proposed approach against conventional mono-signal methods.

Main Methods:

  • Processed stereo audio signals by combining left and right channels.
  • Extracted embedded features, known as d-vectors, from processed audio.
  • Applied a parameter-sharing Gaussian mixture model for supervised utterance clustering.
  • Utilized maximum likelihood for speaker identification during testing.

Main Results:

  • The proposed multichannel audio processing method significantly improved utterance clustering performance.
  • Experimental results demonstrated superior accuracy compared to conventional mono-audio signal methods.
  • The method showed effectiveness even in complex multiperson discussion scenarios.

Conclusions:

  • Multichannel audio signal processing offers a significant advantage for utterance clustering.
  • The developed d-vector extraction and Gaussian mixture model approach is effective.
  • This research provides a more robust solution for speaker diarization in challenging audio conditions.