Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Interference: Path Lengths

Interference: Path Lengths

Consider two sources of sound, that may or may not be in phase, emitting waves at a single frequency, and consider the frequencies to be the same.
Two special sources may be considered when they are in phase. This can be easily achieved by feeding the two sources from the same source. An example would be synchronizing the two speakers by feeding them with the same source, such as the sound waves produced by a tuning fork. This setup ensures that the two sources have the same frequency and are...

Sound Waves: Interference

Sound Waves: Interference

Sound waves can be modeled either as longitudinal waves, wherein the molecules of the medium oscillate around an equilibrium position, or as pressure waves. When two identical waves from the same source superimpose on each other, the combination of two crests or two troughs results in amplitude reinforcement known as constructive interference. If two identical waves, that are initially in phase, become out of phase because of different path lengths, the combination of crests with troughs...

¹H NMR: Interpreting Distorted and Overlapping Signals

¹H NMR: Interpreting Distorted and Overlapping Signals

Spin systems where the difference in chemical shifts of the coupled nuclei is greater than ten times J are called first-order spin systems. These nuclei are weakly coupled, and their chemical shifts and coupling constant can generally be estimated from the well-separated signals in the spectrum.
As Δν decreases and the signals move closer, the doublets appear increasingly distorted. The intensities of the inner lines increase at the cost of those of the outer lines as the signals are...

Difference from Background: Limit of Detection

Difference from Background: Limit of Detection

The limit of detection (LOD) is the smallest amount of analyte that can be distinguished from the background noise. The LOD value corresponds to the concentration at which the analyte signal is three times larger than the standard deviation of the blank signal. Below this value, the analyte signal cannot be differentiated from the background noise. It is calculated by dividing the calibration slope by 3 times the standard deviation of the blank signals.
The LOD indicates the presence or absence...

Perceiving Loudness, Pitch, and Location

Perceiving Loudness, Pitch, and Location

The human brain perceives pitch through two primary mechanisms reflected in place theory and frequency theory. Each mechanism describes how sound waves are interpreted as specific pitches by the brain, offering insights into the intricate processes of auditory perception.
Place theory, or place coding, suggests that different pitches are heard because various sound waves activate specific locations along the cochlea's basilar membrane. The brain determines the pitch of a sound by...

IR Spectrum Peak Splitting: Symmetric vs Asymmetric Vibrations

IR Spectrum Peak Splitting: Symmetric vs Asymmetric Vibrations

Identical bonds within a polyatomic group can stretch symmetrically (in-phase) or asymmetrically (out-of-phase). Similar to hydrogen bonding, these vibrations also influence the shape of the IR peak. Generally, asymmetric stretching frequencies are higher than symmetric stretching frequencies. For example, primary amines exhibit two distinct IR peaks between 3300–3500 cm−1 corresponding to the symmetric and asymmetric N-H stretching, while secondary amines exhibit a single...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

A speech prediction model based on codec modeling and transformer decoding.

Computer speech & language·2026

Same author

A Molecular Trimming Strategy for Hypoxia-Tolerant Photosensitizers With Enhanced cGAS-STING Activation.

Angewandte Chemie (International ed. in English)·2026

Same author

Towards decoupling frontend enhancement and backend recognition in monaural robust ASR.

Computer speech & language·2026

Same author

Efficacy of SWIM technology combined with direct aspiration first pass technique for large vessel occlusion in acute ischemic stroke.

American journal of translational research·2026

Same author

Re-Emergence and Characterization of a Highly Pathogenic Getah Virus on a Pig Farm in Guangdong Province, China.

Microorganisms·2026

Same author

Assembly and analysis of the complete mitochondrial genome of endangered plant <i>Tilia amurensis</i> Rupr.

Frontiers in plant science·2025

Same journal

Read speech voice quality and disfluency in individuals with recent suicidal ideation or suicide attempt.

Speech communication·2026

Same journal

Speechformer-CTC: Sequential Modeling of Depression Detection with Speech Temporal Classification.

Speech communication·2024

Same journal

Temporal envelope cues and simulations of cochlear implant signal processing.

Speech communication·2024

Same journal

Post-Processing Automatic Transcriptions with Machine Learning for Verbal Fluency Scoring.

Speech communication·2024

Same journal

Analysis of acoustic and voice quality features for the classification of infant and mother vocalizations.

Speech communication·2022

Same journal

Audibility emphasis of low-level sounds improves consonant identification while preserving vowel identification for cochlear implant users.

Speech communication·2022

See all related articles

Search research articles

Related Experiment Video

Updated: Mar 25, 2026

Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception

Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception

Published on: August 9, 2024

Noise Perturbation for Supervised Speech Separation.

Jitong Chen¹, Yuxuan Wang¹, DeLiang Wang²

¹Department of Computer Science and Engineering, The Ohio State University, Columbus, Ohio 43210.

Speech Communication

|February 23, 2016

Summary

This summary is machine-generated.

Improving speech separation involves training classifiers with perturbed noise. Frequency perturbation proved most effective, reducing misclassification of noise as speech in low signal-to-noise ratio conditions.

Keywords:

Speech separation noise perturbation supervised learning

More Related Videos

Foreign Accent and Forensic Speaker Identification in Voice Lineups: The Influence of Acoustic Features Based on Prosody

Foreign Accent and Forensic Speaker Identification in Voice Lineups: The Influence of Acoustic Features Based on Prosody

Published on: September 27, 2024

Targeted Training of Ultrasonic Vocalizations in Aged and Parkinsonian Rats

Targeted Training of Ultrasonic Vocalizations in Aged and Parkinsonian Rats

Published on: August 8, 2011

Related Experiment Videos

Last Updated: Mar 25, 2026

Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception

Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception

Published on: August 9, 2024

Foreign Accent and Forensic Speaker Identification in Voice Lineups: The Influence of Acoustic Features Based on Prosody

Foreign Accent and Forensic Speaker Identification in Voice Lineups: The Influence of Acoustic Features Based on Prosody

Published on: September 27, 2024

Targeted Training of Ultrasonic Vocalizations in Aged and Parkinsonian Rats

Targeted Training of Ultrasonic Vocalizations in Aged and Parkinsonian Rats

Published on: August 8, 2011

Area of Science:

Signal Processing
Machine Learning
Acoustics

Background:

Speech separation is crucial for audio processing, often framed as mask estimation.
Supervised methods require effective generalization from limited training data.
Nonstationary noise can cause classifiers to misidentify noise patterns as speech.

Purpose of the Study:

To investigate the impact of noise perturbations on supervised speech separation performance.
To evaluate three specific noise perturbations: rate, vocal tract length, and frequency.
To determine the optimal perturbation strategy for low signal-to-noise ratios (SNRs).

Main Methods:

Trained a classifier on speech and noise mixtures with introduced noise perturbations.
Applied noise rate, vocal tract length, and frequency perturbations.
Evaluated separation performance using classification accuracy, hit-minus-false-alarm rate, and short-time objective intelligibility (STOI).

Main Results:

Frequency perturbation demonstrated superior performance compared to noise rate and vocal tract length perturbations.
Frequency perturbation significantly reduced the misclassification of noise patterns as speech.
All evaluated metrics showed improvement with frequency perturbation at low SNRs.

Conclusions:

Noise perturbation, particularly frequency perturbation, enhances supervised speech separation.
Frequency perturbation is an effective technique for improving classifier robustness against nonstationary noise.
This method offers a promising approach for better speech separation in challenging acoustic environments.