Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Air-entraining Agents01:27

Air-entraining Agents

369
Air-entraining agents improve the durability and workability of concrete in climates with frequent freezing and thawing. These agents prevent cracks by introducing small air bubbles into the mix, creating spaces accommodating water expansion when temperatures drop. The air-entraining agents lower the surface tension of water, forming stable, small air bubbles. This method is more effective than having accidental large voids, as the intentional, smaller, and evenly distributed air voids improve...
369
Classification of Signals01:30

Classification of Signals

1.7K
In signal processing, signals are classified based on various characteristics: continuous-time versus discrete-time, periodic versus aperiodic, analog versus digital, and causal versus noncausal. Each category highlights distinct properties crucial for understanding and manipulating signals.
A continuous-time signal holds a value at every instant in time, representing information seamlessly. In contrast, a discrete-time signal holds values only at specific moments, often denoted as x(n), where...
1.7K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

A speech prediction model based on codec modeling and transformer decoding.

Computer speech & language·2026
Same author

A Molecular Trimming Strategy for Hypoxia-Tolerant Photosensitizers With Enhanced cGAS-STING Activation.

Angewandte Chemie (International ed. in English)·2026
Same author

Towards decoupling frontend enhancement and backend recognition in monaural robust ASR.

Computer speech & language·2026
Same author

Efficacy of SWIM technology combined with direct aspiration first pass technique for large vessel occlusion in acute ischemic stroke.

American journal of translational research·2026
Same author

Halitosis Reduction and Oral Hygiene Improvement: A Study of Chlorhexidine, Oil Pulling, and Saline in Orthodontic Patients with Braces.

The journal of contemporary dental practice·2026
Same author

Manipulating RTP properties of the same organic molecule by polymorphic engineering.

Chemical communications (Cambridge, England)·2025
Same journal

<math></math> Estimation and Voicing Detection With Cascade Architecture in Noisy Speech.

IEEE/ACM transactions on audio, speech, and language processing·2025
Same journal

Speech Enhancement for Cochlear Implant Recipients using Deep Complex Convolution Transformer with Frequency Transformation.

IEEE/ACM transactions on audio, speech, and language processing·2025
Same journal

Selective Acoustic Feature Enhancement for Speech Emotion Recognition With Noisy Speech.

IEEE/ACM transactions on audio, speech, and language processing·2024
Same journal

Glottal Airflow Estimation using Neck Surface Acceleration and Low-Order Kalman Smoothing.

IEEE/ACM transactions on audio, speech, and language processing·2023
Same journal

Bilateral Cochlear Implant Processing of Coding Strategies With CCi-MOBILE, an Open-Source Research Platform.

IEEE/ACM transactions on audio, speech, and language processing·2023
Same journal

Robust Vocal Quality Feature Embeddings for Dysphonic Voice Detection.

IEEE/ACM transactions on audio, speech, and language processing·2023
See all related articles

Related Experiment Video

Updated: Apr 18, 2026

Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception
05:48

Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception

Published on: August 9, 2024

2.1K

On Training Targets for Supervised Speech Separation.

Yuxuan Wang1, Arun Narayanan1, DeLiang Wang2

  • 1Department of Computer Science and Engineering, The Ohio State University, Columbus, OH 43210 USA.

IEEE/ACM Transactions on Audio, Speech, and Language Processing
|January 20, 2015
PubMed
Summary
This summary is machine-generated.

This study compares different training targets for supervised speech separation. Ratio mask targets, like the ideal ratio mask (IRM), significantly improve speech intelligibility and quality over other methods.

Keywords:
Deep neural networksspeech separationsupervised learningtraining targets

More Related Videos

Foreign Accent and Forensic Speaker Identification in Voice Lineups: The Influence of Acoustic Features Based on Prosody
09:09

Foreign Accent and Forensic Speaker Identification in Voice Lineups: The Influence of Acoustic Features Based on Prosody

Published on: September 27, 2024

1.1K
Targeted Training of Ultrasonic Vocalizations in Aged and Parkinsonian Rats
11:00

Targeted Training of Ultrasonic Vocalizations in Aged and Parkinsonian Rats

Published on: August 8, 2011

20.3K

Related Experiment Videos

Last Updated: Apr 18, 2026

Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception
05:48

Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception

Published on: August 9, 2024

2.1K
Foreign Accent and Forensic Speaker Identification in Voice Lineups: The Influence of Acoustic Features Based on Prosody
09:09

Foreign Accent and Forensic Speaker Identification in Voice Lineups: The Influence of Acoustic Features Based on Prosody

Published on: September 27, 2024

1.1K
Targeted Training of Ultrasonic Vocalizations in Aged and Parkinsonian Rats
11:00

Targeted Training of Ultrasonic Vocalizations in Aged and Parkinsonian Rats

Published on: August 8, 2011

20.3K

Area of Science:

  • Signal Processing
  • Machine Learning
  • Speech Technology

Background:

  • Speech separation is often framed as a supervised learning task, typically using deep neural networks.
  • The ideal binary mask (IBM) is a common target, known for enhancing speech intelligibility.
  • Supervised learning offers flexibility beyond binary targets for speech separation.

Purpose of the Study:

  • To evaluate and compare the effectiveness of various training targets for supervised speech separation.
  • To determine which targets yield the best objective intelligibility and quality metrics.
  • To compare supervised methods against traditional techniques like non-negative matrix factorization.

Main Methods:

  • Training supervised learning models with different targets: ideal binary mask (IBM), target binary mask, ideal ratio mask (IRM), short-time Fourier transform spectral magnitude and mask (FFT-MASK), and Gammatone frequency power spectrum.
  • Evaluating separation performance using objective intelligibility and quality metrics.
  • Comparing results with non-negative matrix factorization and other speech enhancement methods.

Main Results:

  • Ideal ratio mask (IRM) and FFT-MASK targets outperform other tested targets in objective intelligibility and quality.
  • Masking-based targets generally yield superior results compared to spectral envelope-based targets.
  • Supervised speech separation demonstrates clear performance advantages over non-negative matrix factorization and traditional speech enhancement techniques.

Conclusions:

  • Ratio mask targets are highly effective for supervised speech separation, offering significant improvements in intelligibility and quality.
  • The supervised learning framework provides a powerful and flexible approach to speech separation.
  • This research highlights the superiority of modern supervised methods for enhancing speech separation performance.