Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Classification of Signals01:30

Classification of Signals

1.5K
In signal processing, signals are classified based on various characteristics: continuous-time versus discrete-time, periodic versus aperiodic, analog versus digital, and causal versus noncausal. Each category highlights distinct properties crucial for understanding and manipulating signals.
A continuous-time signal holds a value at every instant in time, representing information seamlessly. In contrast, a discrete-time signal holds values only at specific moments, often denoted as x(n), where...
1.5K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

A speech prediction model based on codec modeling and transformer decoding.

Computer speech & language·2026
Same author

A Molecular Trimming Strategy for Hypoxia-Tolerant Photosensitizers With Enhanced cGAS-STING Activation.

Angewandte Chemie (International ed. in English)·2026
Same author

Towards decoupling frontend enhancement and backend recognition in monaural robust ASR.

Computer speech & language·2026
Same author

Efficacy of SWIM technology combined with direct aspiration first pass technique for large vessel occlusion in acute ischemic stroke.

American journal of translational research·2026
Same author

Ginsenoside Rg1 alleviates symptoms of Parkinson's disease in mice through activating PPARγ and inhibiting MAPK signaling.

Journal of Asian natural products research·2026
Same author

[Effects of electroacupuncture on the SIRT3/NF-κB /MAPK signaling pathway in the substantia nigra of mice with Parkinson's disease].

Zhen ci yan jiu = Acupuncture research·2026
Same journal

<math></math> Estimation and Voicing Detection With Cascade Architecture in Noisy Speech.

IEEE/ACM transactions on audio, speech, and language processing·2025
Same journal

Speech Enhancement for Cochlear Implant Recipients using Deep Complex Convolution Transformer with Frequency Transformation.

IEEE/ACM transactions on audio, speech, and language processing·2025
Same journal

Selective Acoustic Feature Enhancement for Speech Emotion Recognition With Noisy Speech.

IEEE/ACM transactions on audio, speech, and language processing·2024
Same journal

Glottal Airflow Estimation using Neck Surface Acceleration and Low-Order Kalman Smoothing.

IEEE/ACM transactions on audio, speech, and language processing·2023
Same journal

Bilateral Cochlear Implant Processing of Coding Strategies With CCi-MOBILE, an Open-Source Research Platform.

IEEE/ACM transactions on audio, speech, and language processing·2023
Same journal

Robust Vocal Quality Feature Embeddings for Dysphonic Voice Detection.

IEEE/ACM transactions on audio, speech, and language processing·2023
See all related articles

Related Experiment Video

Updated: Mar 10, 2026

Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception
05:48

Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception

Published on: August 9, 2024

2.1K

A Deep Ensemble Learning Method for Monaural Speech Separation.

Xiao-Lei Zhang1, DeLiang Wang1

  • 1Department of Computer Science and Engineering and the Center for Cognitive and Brain Sciences, The Ohio State University, Columbus, OH 43210 USA, and also with the Center of Intelligent Acoustics and Immersive Communications, Northwestern Polytechnical University, Xi'an 710072, China.

IEEE/ACM Transactions on Audio, Speech, and Language Processing
|December 6, 2016
PubMed
Summary
This summary is machine-generated.

This study introduces multicontext networks for monaural speech separation, improving performance by using multiple deep neural networks (DNNs) with varying contexts. The research highlights that predicting time-frequency masks is more effective for training data utilization than predicting clean speech.

Keywords:
Deep neural networksensemble learningmapping-based separationmasking-based separationmonaural speech separationmulticontext networks

More Related Videos

Foreign Accent and Forensic Speaker Identification in Voice Lineups: The Influence of Acoustic Features Based on Prosody
09:09

Foreign Accent and Forensic Speaker Identification in Voice Lineups: The Influence of Acoustic Features Based on Prosody

Published on: September 27, 2024

953
Author Spotlight: Advancing Large-Scale Neural Dynamics Through HD-MEA Technology
09:44

Author Spotlight: Advancing Large-Scale Neural Dynamics Through HD-MEA Technology

Published on: March 8, 2024

6.0K

Related Experiment Videos

Last Updated: Mar 10, 2026

Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception
05:48

Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception

Published on: August 9, 2024

2.1K
Foreign Accent and Forensic Speaker Identification in Voice Lineups: The Influence of Acoustic Features Based on Prosody
09:09

Foreign Accent and Forensic Speaker Identification in Voice Lineups: The Influence of Acoustic Features Based on Prosody

Published on: September 27, 2024

953
Author Spotlight: Advancing Large-Scale Neural Dynamics Through HD-MEA Technology
09:44

Author Spotlight: Advancing Large-Scale Neural Dynamics Through HD-MEA Technology

Published on: March 8, 2024

6.0K

Area of Science:

  • Speech Processing
  • Artificial Intelligence
  • Machine Learning

Background:

  • Monaural speech separation is crucial for clear audio in single-microphone environments.
  • Deep neural networks (DNNs) have advanced speech separation but have limitations in contextual information and understanding optimization objectives.
  • Existing DNN methods often use a fixed window length, limiting their ability to capture diverse speech characteristics.

Purpose of the Study:

  • To propose a novel deep ensemble method, multicontext networks, for enhanced monaural speech separation.
  • To address the limitations of single DNNs in leveraging contextual information and understanding optimization objectives.
  • To systematically compare the effectiveness of predicting clean speech versus ideal time-frequency masks.

Main Methods:

  • Developed two multicontext network architectures: one averaging outputs of DNNs with different window lengths, and another stacking DNNs with varied contexts.
  • Each DNN in the stacked network processes concatenated acoustic features and expanded soft outputs from lower modules.
  • Conducted extensive experiments on three speech corpora to validate the proposed method.

Main Results:

  • The proposed multicontext networks significantly improve monaural speech separation performance.
  • Demonstrated the effectiveness of ensemble methods in overcoming single DNN limitations.
  • Found that predicting ideal time-frequency masks is more efficient for training data, while predicting clean speech offers better robustness to Signal-to-Noise Ratio (SNR) variations.

Conclusions:

  • Multicontext networks offer a superior approach to monaural speech separation compared to single DNN models.
  • The study provides valuable insights into the comparative advantages of different DNN optimization objectives in speech separation.
  • The findings contribute to the development of more robust and effective speech processing technologies.