Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Integration by Parts: Problem Solving

Integration by Parts: Problem Solving

Smart speakers process voice commands by modeling audio inputs as piecewise functions and analyzing them through integration against trigonometric functions, such as cosine. This mathematical approach is fundamental in signal processing, where complex sound waves are decomposed into simpler frequency components.Consider a definite integral involving a piecewise function multiplied by a cosine function. Because the function is defined differently over separate intervals, the integral is split...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Large Language Models Reveal the Neural Tracking of Linguistic Context in Attended and Unattended Multi-Talker Speech.

bioRxiv : the preprint server for biology·2026

Same author

Large language models reveal the neural tracking of linguistic context in attended and unattended multi-talker speech.

Imaging neuroscience (Cambridge, Mass.)·2026

Same author

Real-time brain-controlled selective hearing enhances speech perception in multi-talker environments.

Nature neuroscience·2026

Same author

From Selective Listening to Brain-Controlled Hearing: A Perspective on the Future of Auditory Technology.

Journal of the Association for Research in Otolaryngology : JARO·2026

Same author

Convolutional neural network models describe the encoding subspace of local circuits in auditory cortex.

Nature neuroscience·2026

Same author

Speaker Identity is Robustly Encoded in Spatial Patterns of Intracranial EEG for Attention Decoding.

bioRxiv : the preprint server for biology·2025

Same journal

Distributionally Robust Feature Selection.

Advances in neural information processing systems·2026

Same journal

On the Identifiability of Hybrid Deep Generative Models: Meta-Learning as a Solution.

Advances in neural information processing systems·2026

Same journal

Unlocking hidden biomolecular conformational landscapes in diffusion models at inference time.

Advances in neural information processing systems·2026

Same journal

JADE: Joint Alignment and Deep Embedding for Multi-Slice Spatial Transcriptomics.

Advances in neural information processing systems·2026

Same journal

Learning to Route: Per-Sample Adaptive Routing for Multimodal Multitask Prediction.

Advances in neural information processing systems·2026

Same journal

Emergence and Evolution of Interpretable Concepts in Diffusion Models.

Advances in neural information processing systems·2026

See all related articles

Search research articles

Related Experiment Video

Updated: May 5, 2026

A Method for Tracking the Time Evolution of Steady-State Evoked Potentials

A Method for Tracking the Time Evolution of Steady-State Evoked Potentials

Published on: May 25, 2019

Understanding Adaptive, Multiscale Temporal Integration In Deep Speech Recognition Systems.

Menoua Keshishian¹, Sam V Norman-Haignere¹, Nima Mesgarani¹

¹Department of Electrical Engineering, Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY 10027.

Advances in Neural Information Processing Systems

|May 13, 2024

Summary

This summary is machine-generated.

Deep neural networks (DNNs) learn speech by integrating information across different timescales. This study reveals a hierarchical structure in DNNs, with early layers using time-yoked integration and later layers using structure-yoked integration.

More Related Videos

Foreign Accent and Forensic Speaker Identification in Voice Lineups: The Influence of Acoustic Features Based on Prosody

Foreign Accent and Forensic Speaker Identification in Voice Lineups: The Influence of Acoustic Features Based on Prosody

Published on: September 27, 2024

Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception

Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception

Published on: August 9, 2024

Related Experiment Videos

Last Updated: May 5, 2026

A Method for Tracking the Time Evolution of Steady-State Evoked Potentials

A Method for Tracking the Time Evolution of Steady-State Evoked Potentials

Published on: May 25, 2019

Foreign Accent and Forensic Speaker Identification in Voice Lineups: The Influence of Acoustic Features Based on Prosody

Foreign Accent and Forensic Speaker Identification in Voice Lineups: The Influence of Acoustic Features Based on Prosody

Published on: September 27, 2024

Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception

Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception

Published on: August 9, 2024

Area of Science:

Computational neuroscience
Machine learning
Speech processing

Background:

Natural speech signals exhibit hierarchical structures across multiple timescales.
Deep neural networks (DNNs) are effective at pattern recognition but their temporal integration mechanisms are not well understood.

Purpose of the Study:

To investigate temporal integration in DNNs using the temporal context invariance (TCI) paradigm.
To understand how DNNs, specifically DeepSpeech2, process speech across different timescales.

Main Methods:

Applied the TCI paradigm to measure temporal integration windows in DNN units.
Analyzed responses to stimulus segments in varying contexts.
Investigated integration window changes during training and with time-stretched/compressed speech.

Main Results:

Most DNN units exhibit compact integration windows.
Training leads to shrinking integration windows in early layers and expanding windows in later layers, forming a hierarchy.
A transition point was identified where integration windows become structure-yoked (e.g., phoneme duration) rather than absolute time-dependent.
Similar phenomena observed in recurrent and convolutional networks, with structure-yoked integration more prominent in recurrent networks.

Conclusions:

DNNs employ a hierarchical motif for speech encoding: short, time-yoked windows in early layers and long, structure-yoked windows in later layers.
The TCI paradigm offers a versatile tool for analyzing temporal integration in complex machine learning models.