Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Perceiving Loudness, Pitch, and Location

Perceiving Loudness, Pitch, and Location

The human brain perceives pitch through two primary mechanisms reflected in place theory and frequency theory. Each mechanism describes how sound waves are interpreted as specific pitches by the brain, offering insights into the intricate processes of auditory perception.
Place theory, or place coding, suggests that different pitches are heard because various sound waves activate specific locations along the cochlea's basilar membrane. The brain determines the pitch of a sound by...

Chunking and Rehearsal in Sensory Memory

Chunking and Rehearsal in Sensory Memory

Improving short-term memory can be achieved through techniques like chunking and rehearsal. Chunking involves organizing information into larger, more manageable units. This technique is particularly useful for information that exceeds the typical memory span of between five and nine items. For instance, logging into an online account with a password like "ta89vq0179gz" involves grouping letters and numbers into three chunks—ta89, vq01, and 79gz. It makes large amounts of...

Auditory Pathway

Auditory Pathway

Auditory pathways constitute the complex neural circuits responsible for transmitting and interpreting auditory information from the peripheral auditory system to the brain. Sound waves are initially captured by the outer ear, funneled through the ear canal, and reach the tympanic membrane (eardrum). These vibrations are transmitted via the middle ear's ossicles to the inner ear's cochlea.
When viewed cross-sectionally, the cochlea reveals the scala vestibuli and scala tympani flanking...

Auditory Perception

Auditory Perception

The auditory system is essential for sound perception, utilizing various critical structures. When sound waves enter the outer ear, they travel through the ear canal and cause the eardrum to vibrate. These vibrations are then transmitted to the middle ear, where three tiny bones – the malleus, incus, and stapes – amplify the sound. This amplification is crucial, as it ensures that the sound vibrations are strong enough to be conveyed to the inner ear. These vibrations then reach the...

Stereoisomers

Stereoisomers

On the basis of mirror symmetry, stereoisomers of an organic molecule can be further classified into diastereomers and enantiomers. Diastereomers are stereoisomers that are not mirror images of each other. Substituted alkenes, such as the cis and trans isomers of 2-butene, are diastereomers, as these molecules exhibit different spatial orientations of their constituent atoms, are not mirror images of each other, and do not interconvert. Here, the interconversion is suppressed due to...

Hearing

Hearing

When we hear a sound, our nervous system is detecting sound waves—pressure waves of mechanical energy traveling through a medium. The frequency of the wave is perceived as pitch, while the amplitude is perceived as loudness.

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Using covariance of node states to design early warning signals for network dynamics.

Philosophical transactions. Series A, Mathematical, physical, and engineering sciences·2026

Same author

Observing network dynamics through sentinel nodes.

Nature communications·2025

Same author

Applicability of spatial early warning signals to complex network dynamics.

Journal of the Royal Society, Interface·2025

Same author

Swarm systems as a platform for open-ended evolutionary dynamics.

Philosophical transactions. Series A, Mathematical, physical, and engineering sciences·2025

Same author

COVID-19 vaccine messaging for young adults: Examining framing, other-referencing, and health beliefs.

Health psychology : official journal of the Division of Health Psychology, American Psychological Association·2024

Same author

Anticipating regime shifts by mixing early warning signals from different nodes.

Nature communications·2024

Same journal

RETRACTION: Multidimensional Heterogeneous Network Link Adaptation Based on Mobile Environment.

Computational intelligence and neuroscience·2026

Same journal

RETRACTION: Framework to Segment and Evaluate Multiple Sclerosis Lesion in MRI Slices Using VGG-UNet.

Computational intelligence and neuroscience·2026

Same journal

RETRACTION: Facial Emotion Recognition Using a Novel Fusion of Convolutional Neural Network and Local Binary Pattern in Crime Investigation.

Computational intelligence and neuroscience·2026

Same journal

RETRACTION: Automatic Intelligent System Using Medical of Things for Multiple Sclerosis Detection.

Computational intelligence and neuroscience·2026

Same journal

RETRACTION: Intangible Cultural Heritage Reproduction and Revitalization: Value Feedback, Practice, and Exploration Based on the IPA Model.

Computational intelligence and neuroscience·2026

Same journal

RETRACTION: CNN Based Multiclass Brain Tumor Detection Using Medical Imaging.

Computational intelligence and neuroscience·2025

See all related articles

Search research articles

Related Experiment Video

Updated: Oct 17, 2025

Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception

Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception

Published on: August 9, 2024

Utterance Clustering Using Stereo Audio Channels.

Yingjun Dong^1,2, Neil G MacLaren^1,3, Yiding Cao^1,2

¹Center for Collective Dynamics of Complex Systems, Binghamton University, State University of New York, Binghamton, NY 13902-6000, USA.

Computational Intelligence and Neuroscience

|October 7, 2021

Summary

This summary is machine-generated.

This study enhances utterance clustering using multichannel audio signals, improving speaker identification accuracy. The novel approach outperforms traditional mono-audio methods in complex discussions.

More Related Videos

Foreign Accent and Forensic Speaker Identification in Voice Lineups: The Influence of Acoustic Features Based on Prosody

Foreign Accent and Forensic Speaker Identification in Voice Lineups: The Influence of Acoustic Features Based on Prosody

Published on: September 27, 2024

Sound Source Localization Testing in Single-sided Deafness Following Bone Conduction Intervention

Sound Source Localization Testing in Single-sided Deafness Following Bone Conduction Intervention

Published on: December 20, 2024

Related Experiment Videos

Last Updated: Oct 17, 2025

Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception

Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception

Published on: August 9, 2024

Foreign Accent and Forensic Speaker Identification in Voice Lineups: The Influence of Acoustic Features Based on Prosody

Foreign Accent and Forensic Speaker Identification in Voice Lineups: The Influence of Acoustic Features Based on Prosody

Published on: September 27, 2024

Sound Source Localization Testing in Single-sided Deafness Following Bone Conduction Intervention

Sound Source Localization Testing in Single-sided Deafness Following Bone Conduction Intervention

Published on: December 20, 2024

Area of Science:

Audio signal processing
Machine learning
Speech recognition

Background:

Utterance clustering is crucial for separating speakers in audio.
Current methods often rely on single-channel (mono) audio signals.
Improving clustering performance in complex acoustic environments remains a challenge.

Purpose of the Study:

To enhance utterance clustering performance by utilizing multichannel (stereo) audio signals.
To investigate novel methods for processing stereo audio for improved feature extraction.
To evaluate the effectiveness of the proposed approach against conventional mono-signal methods.

Main Methods:

Processed stereo audio signals by combining left and right channels.
Extracted embedded features, known as d-vectors, from processed audio.
Applied a parameter-sharing Gaussian mixture model for supervised utterance clustering.
Utilized maximum likelihood for speaker identification during testing.

Main Results:

The proposed multichannel audio processing method significantly improved utterance clustering performance.
Experimental results demonstrated superior accuracy compared to conventional mono-audio signal methods.
The method showed effectiveness even in complex multiperson discussion scenarios.

Conclusions:

Multichannel audio signal processing offers a significant advantage for utterance clustering.
The developed d-vector extraction and Gaussian mixture model approach is effective.
This research provides a more robust solution for speaker diarization in challenging audio conditions.