Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Integration by Parts: Problem Solving01:29

Integration by Parts: Problem Solving

211
Smart speakers process voice commands by modeling audio inputs as piecewise functions and analyzing them through integration against trigonometric functions, such as cosine. This mathematical approach is fundamental in signal processing, where complex sound waves are decomposed into simpler frequency components.Consider a definite integral involving a piecewise function multiplied by a cosine function. Because the function is defined differently over separate intervals, the integral is split...
211

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Large Language Models Reveal the Neural Tracking of Linguistic Context in Attended and Unattended Multi-Talker Speech.

bioRxiv : the preprint server for biology·2026
Same author

Large language models reveal the neural tracking of linguistic context in attended and unattended multi-talker speech.

Imaging neuroscience (Cambridge, Mass.)·2026
Same author

Real-time brain-controlled selective hearing enhances speech perception in multi-talker environments.

Nature neuroscience·2026
Same author

From Selective Listening to Brain-Controlled Hearing: A Perspective on the Future of Auditory Technology.

Journal of the Association for Research in Otolaryngology : JARO·2026
Same author

Convolutional neural network models describe the encoding subspace of local circuits in auditory cortex.

Nature neuroscience·2026
Same author

Speaker Identity is Robustly Encoded in Spatial Patterns of Intracranial EEG for Attention Decoding.

bioRxiv : the preprint server for biology·2025
Same journal

Distributionally Robust Feature Selection.

Advances in neural information processing systems·2026
Same journal

On the Identifiability of Hybrid Deep Generative Models: Meta-Learning as a Solution.

Advances in neural information processing systems·2026
Same journal

Unlocking hidden biomolecular conformational landscapes in diffusion models at inference time.

Advances in neural information processing systems·2026
Same journal

JADE: Joint Alignment and Deep Embedding for Multi-Slice Spatial Transcriptomics.

Advances in neural information processing systems·2026
Same journal

Learning to Route: Per-Sample Adaptive Routing for Multimodal Multitask Prediction.

Advances in neural information processing systems·2026
Same journal

Emergence and Evolution of Interpretable Concepts in Diffusion Models.

Advances in neural information processing systems·2026
See all related articles

Related Experiment Video

Updated: May 5, 2026

A Method for Tracking the Time Evolution of Steady-State Evoked Potentials
12:03

A Method for Tracking the Time Evolution of Steady-State Evoked Potentials

Published on: May 25, 2019

8.4K

Understanding Adaptive, Multiscale Temporal Integration In Deep Speech Recognition Systems.

Menoua Keshishian1, Sam V Norman-Haignere1, Nima Mesgarani1

  • 1Department of Electrical Engineering, Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY 10027.

Advances in Neural Information Processing Systems
|May 13, 2024
PubMed
Summary
This summary is machine-generated.

Deep neural networks (DNNs) learn speech by integrating information across different timescales. This study reveals a hierarchical structure in DNNs, with early layers using time-yoked integration and later layers using structure-yoked integration.

More Related Videos

Foreign Accent and Forensic Speaker Identification in Voice Lineups: The Influence of Acoustic Features Based on Prosody
09:09

Foreign Accent and Forensic Speaker Identification in Voice Lineups: The Influence of Acoustic Features Based on Prosody

Published on: September 27, 2024

441
Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception
05:48

Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception

Published on: August 9, 2024

1.5K

Related Experiment Videos

Last Updated: May 5, 2026

A Method for Tracking the Time Evolution of Steady-State Evoked Potentials
12:03

A Method for Tracking the Time Evolution of Steady-State Evoked Potentials

Published on: May 25, 2019

8.4K
Foreign Accent and Forensic Speaker Identification in Voice Lineups: The Influence of Acoustic Features Based on Prosody
09:09

Foreign Accent and Forensic Speaker Identification in Voice Lineups: The Influence of Acoustic Features Based on Prosody

Published on: September 27, 2024

441
Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception
05:48

Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception

Published on: August 9, 2024

1.5K

Area of Science:

  • Computational neuroscience
  • Machine learning
  • Speech processing

Background:

  • Natural speech signals exhibit hierarchical structures across multiple timescales.
  • Deep neural networks (DNNs) are effective at pattern recognition but their temporal integration mechanisms are not well understood.

Purpose of the Study:

  • To investigate temporal integration in DNNs using the temporal context invariance (TCI) paradigm.
  • To understand how DNNs, specifically DeepSpeech2, process speech across different timescales.

Main Methods:

  • Applied the TCI paradigm to measure temporal integration windows in DNN units.
  • Analyzed responses to stimulus segments in varying contexts.
  • Investigated integration window changes during training and with time-stretched/compressed speech.

Main Results:

  • Most DNN units exhibit compact integration windows.
  • Training leads to shrinking integration windows in early layers and expanding windows in later layers, forming a hierarchy.
  • A transition point was identified where integration windows become structure-yoked (e.g., phoneme duration) rather than absolute time-dependent.
  • Similar phenomena observed in recurrent and convolutional networks, with structure-yoked integration more prominent in recurrent networks.

Conclusions:

  • DNNs employ a hierarchical motif for speech encoding: short, time-yoked windows in early layers and long, structure-yoked windows in later layers.
  • The TCI paradigm offers a versatile tool for analyzing temporal integration in complex machine learning models.