Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Experiment Video

Updated: Jun 21, 2026

Foreign Accent and Forensic Speaker Identification in Voice Lineups: The Influence of Acoustic Features Based on Prosody
09:09

Foreign Accent and Forensic Speaker Identification in Voice Lineups: The Influence of Acoustic Features Based on Prosody

Published on: September 27, 2024

Multistream articulatory feature-based models for visual speech recognition.

Kate Saenko1, Karen Livescu, James Glass

  • 1MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, MA 02139, USA. saenko@csail.mit.edu

IEEE Transactions on Pattern Analysis and Machine Intelligence
|July 4, 2009
PubMed
Summary
This summary is machine-generated.

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Biomarkers.

Alzheimer's & dementia : the journal of the Alzheimer's Association·2025
Same author

CMKD: CNN/Transformer-Based Cross-Model Knowledge Distillation for Audio Classification.

IEEE transactions on pattern analysis and machine intelligence·2025
Same author

SPEECH RECOGNITION FOR ANALYSIS OF POLICE RADIO COMMUNICATION.

SLT ... : ... IEEE Workshop on Spoken Language Technology : proceedings. IEEE Workshop on Spoken Language Technology·2025
Same author

Obfuscation via pitch-shifting for balancing privacy and diagnostic utility in voice-based cognitive assessment.

Alzheimer's & dementia : the journal of the Alzheimer's Association·2025
Same author

Obfuscation via pitch-shifting for balancing privacy and diagnostic utility in voice-based cognitive assessment.

medRxiv : the preprint server for health sciences·2024
Same author

Understanding suicide risk in vet professionals.

The Veterinary record·2024
Same journal

Relation DETR+: Exploring Explicit Position Relation Prior for Dense Prediction.

IEEE transactions on pattern analysis and machine intelligence·2026
Same journal

RBF++: Quantifying and Optimizing Reasoning Boundaries across Measurable and Unmeasurable Capabilities for Chain-of-Thought Reasoning.

IEEE transactions on pattern analysis and machine intelligence·2026
Same journal

CAFE: Cross-View Adaptive Fusion and Cluster Center Enhancement for Robust Multi-View Clustering.

IEEE transactions on pattern analysis and machine intelligence·2026
Same journal

DIVER: Reinforced Diffusion Breaks Imitation Bottlenecks in End-to-End Autonomous Driving.

IEEE transactions on pattern analysis and machine intelligence·2026
Same journal

Ethics-Aware Safe Reinforcement Learning for Rare-Event Risk Control in Interactive Urban Driving.

IEEE transactions on pattern analysis and machine intelligence·2026
Same journal

Learning Shape Anchors for Holistic Indoor Scene Understanding.

IEEE transactions on pattern analysis and machine intelligence·2026
See all related articles

This study introduces articulatory feature (AF)-based dynamic Bayesian network (DBN) models for visual speech recognition (VSR). These models significantly outperform baseline approaches in recognizing spoken words and phrases.

Area of Science:

  • Computer Science
  • Artificial Intelligence
  • Signal Processing

Background:

  • Automatic visual speech recognition (VSR) aims to interpret speech from visual cues.
  • Dynamic Bayesian Networks (DBNs) offer a framework for modeling sequential data, including speech.
  • Articulatory features (AFs) like lip movements are crucial for VSR but challenging to model directly.

Purpose of the Study:

  • To develop and evaluate DBN-based models for VSR that leverage articulatory features.
  • To compare the performance of AF-based VSR models against baseline methods.
  • To investigate the impact of different model configurations and input types on VSR accuracy.

Main Methods:

  • Utilized dynamic Bayesian network (DBN) models with multiple hidden state sequences representing articulatory features (AFs).

Related Experiment Videos

Last Updated: Jun 21, 2026

Foreign Accent and Forensic Speaker Identification in Voice Lineups: The Influence of Acoustic Features Based on Prosody
09:09

Foreign Accent and Forensic Speaker Identification in Voice Lineups: The Influence of Acoustic Features Based on Prosody

Published on: September 27, 2024

  • Employed a bank of discriminative AF classifiers to generate input, using virtual evidence (VE) or raw classifier margins.
  • Conducted experiments on medium-vocabulary word-ranking and small-vocabulary phrase recognition tasks.
  • Main Results:

    • Articulatory feature-based DBN models demonstrated superior performance compared to baseline models.
    • Investigated the effects of articulatory asynchrony, dictionary-based vs. whole-word models, and different observation models.
    • Showcased the effectiveness of virtual evidence for incorporating classifier outputs.

    Conclusions:

    • AF-based DBN models represent a promising approach for enhancing visual speech recognition.
    • The findings highlight the importance of modeling articulatory dynamics for improved VSR accuracy.
    • Further research can explore advanced DBN architectures and feature extraction techniques for VSR.