Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Generalization, Discrimination, and Extinction

Generalization, Discrimination, and Extinction

Generalization, discrimination, and extinction are key concepts in operant conditioning that influence how behaviors are learned and maintained.
Generalization occurs when a behavior reinforced in one context is performed in similar situations. For instance, a student who studies diligently for calculus and receives excellent grades might apply the same study habits to psychology and history, expecting similar results. Generalization shows how learning in one setting can influence behavior in...

Chunking and Rehearsal in Sensory Memory

Chunking and Rehearsal in Sensory Memory

Improving short-term memory can be achieved through techniques like chunking and rehearsal. Chunking involves organizing information into larger, more manageable units. This technique is particularly useful for information that exceeds the typical memory span of between five and nine items. For instance, logging into an online account with a password like "ta89vq0179gz" involves grouping letters and numbers into three chunks—ta89, vq01, and 79gz. It makes large amounts of...

Long-Term Memory

Long-Term Memory

Long-term memory is a relatively permanent type of memory, capable of storing vast amounts of information over extended periods. Its storage capacity is generally considered unlimited.
Long-term memory can be categorized into two primary types: explicit and implicit memory. Explicit memory, also known as declarative memory, involves the conscious recollection of information that we deliberately try to remember, recall, and articulate. This type of memory encompasses specific facts, events, and...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

A speech prediction model based on codec modeling and transformer decoding.

Computer speech & language·2026

Same author

A Molecular Trimming Strategy for Hypoxia-Tolerant Photosensitizers With Enhanced cGAS-STING Activation.

Angewandte Chemie (International ed. in English)·2026

Same author

Towards decoupling frontend enhancement and backend recognition in monaural robust ASR.

Computer speech & language·2026

Same author

Efficacy of SWIM technology combined with direct aspiration first pass technique for large vessel occlusion in acute ischemic stroke.

American journal of translational research·2026

Same author

Re-Emergence and Characterization of a Highly Pathogenic Getah Virus on a Pig Farm in Guangdong Province, China.

Microorganisms·2026

Same author

Assembly and analysis of the complete mitochondrial genome of endangered plant <i>Tilia amurensis</i> Rupr.

Frontiers in plant science·2025

Same journal

Reducing computational complexity in adaptive sound zones with online room impulse response estimation.

The Journal of the Acoustical Society of America·2026

Same journal

Small-sample unbiased linear coherence estimators for a complex Gaussian random process.

The Journal of the Acoustical Society of America·2026

Same journal

Automated detection and annotation of toothed-whale whistles using transformer-based instance segmentation.

The Journal of the Acoustical Society of America·2026

Same journal

Effect of temperature and concentration on the thermo-acoustic behavior of vitamin B5 (d-Panthenol) solutions in the presence of glycol additives.

The Journal of the Acoustical Society of America·2026

Same journal

The visome: Using cognitive networks to examine lip-reading errors in English words.

The Journal of the Acoustical Society of America·2026

Same journal

Resident subjective annoyance responses to combined road traffic and train-induced structure-borne noise: Effects of sound environment.

The Journal of the Acoustical Society of America·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Feb 27, 2026

Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception

Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception

Published on: August 9, 2024

Long short-term memory for speaker generalization in supervised speech separation.

Jitong Chen¹, DeLiang Wang¹

¹Department of Computer Science and Engineering, The Ohio State University, Columbus, Ohio 43210, USA.

The Journal of the Acoustical Society of America

|July 7, 2017

Summary

This summary is machine-generated.

This study introduces a Long Short-Term Memory (LSTM) model for speech separation, significantly improving performance on unseen speakers and noises compared to deep neural networks (DNNs). The LSTM model enhances speech intelligibility and is efficient for low-latency applications.

More Related Videos

Foreign Accent and Forensic Speaker Identification in Voice Lineups: The Influence of Acoustic Features Based on Prosody

Foreign Accent and Forensic Speaker Identification in Voice Lineups: The Influence of Acoustic Features Based on Prosody

Published on: September 27, 2024

Sound Source Localization Testing in Single-sided Deafness Following Bone Conduction Intervention

Sound Source Localization Testing in Single-sided Deafness Following Bone Conduction Intervention

Published on: December 20, 2024

Related Experiment Videos

Last Updated: Feb 27, 2026

Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception

Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception

Published on: August 9, 2024

Foreign Accent and Forensic Speaker Identification in Voice Lineups: The Influence of Acoustic Features Based on Prosody

Foreign Accent and Forensic Speaker Identification in Voice Lineups: The Influence of Acoustic Features Based on Prosody

Published on: September 27, 2024

Sound Source Localization Testing in Single-sided Deafness Following Bone Conduction Intervention

Sound Source Localization Testing in Single-sided Deafness Following Bone Conduction Intervention

Published on: December 20, 2024

Area of Science:

Signal Processing
Machine Learning
Acoustics

Background:

Speech separation is crucial for enhancing audio quality in noisy environments.
Supervised speech separation models struggle with generalization to new noises and speakers.
Deep Neural Networks (DNNs) show promise but have limitations in modeling diverse speaker characteristics.

Purpose of the Study:

To develop a speech separation model that improves generalization to unseen speakers and noises.
To leverage the temporal modeling capabilities of Long Short-Term Memory (LSTM) networks for enhanced speaker generalization.
To evaluate the proposed LSTM-based model against DNN-based approaches for speech intelligibility and efficiency.

Main Methods:

Formulating speech separation as estimating a time-frequency mask from acoustic features.
Developing a novel speech separation model utilizing Long Short-Term Memory (LSTM) architecture.
Conducting systematic evaluations comparing the LSTM model with a DNN-based model on objective speech intelligibility metrics.

Main Results:

The proposed LSTM model significantly outperforms the DNN-based model on unseen speakers and noises.
LSTM's internal representations demonstrate effective capture of long-term speech contexts.
The LSTM model shows advantages for low-latency speech separation, even without future frame information.

Conclusions:

The LSTM-based model offers an effective solution for speaker- and noise-independent speech separation.
LSTM networks provide superior generalization capabilities compared to traditional DNNs for this task.
The proposed approach is promising for real-time and robust speech enhancement applications.