Search research articles

Search research articles

Related Experiment Videos

Lipreading from color video.

G I Chiou¹, J N Hwang

¹Dept. of Electr. Eng., Washington Univ., Seattle, WA.

IEEE Transactions on Image Processing : a Publication of the IEEE Signal Processing Society

|January 1, 1997

Summary

This summary is machine-generated.

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Contour tracking using a knowledge-based snake algorithm to construct three-dimensional pharyngeal bolus movement.

Dysphagia·1999

Same author

Closed contour edge detection of blood vessel lumen and outer wall boundaries in black-blood MR images.

Magnetic resonance imaging·1999

Same author

Three-dimensional object representation and invariant recognition using continuous distance transform neural networks.

IEEE transactions on neural networks·1997

Same author

Robust speech recognition based on joint model and feature space optimization of hidden Markov models.

IEEE transactions on neural networks·1997

Same author

The cascade-correlation learning: a projection pursuit learning perspective.

IEEE transactions on neural networks·1996

Same author

A neural network-based stochastic active contour model (NNS-SNAKE) for contour finding of distinct features.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·1995

Same journal

Change-Prior-Guided Unsupervised Change Detection of Heterogeneous Remote Sensing Images.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

Same journal

AgonicDreamer: Enhancing Multi-View Consistency in Text-to-3D Generation via Rectified Score Distillation.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

Same journal

BiCM-Prompt: Bidirectional Cross-Modal Prompt Tuning for Class-Incremental Learning on Multisource Remote Sensing Images.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

Same journal

GoP-based Quality Enhancement on Video Compression.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

Same journal

Align then Tensorize: Multi-Level Consistent Anchor Graph Learning for Scalable Multi-View Clustering.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

Same journal

Beyond Fidelity: Diverse Image Synthesis via Retrieval-Augmented Diffusion.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

See all related articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

This study presents a lipreading system using only lip color video for word recognition. The visual-only system achieved 94% accuracy for ten isolated words.

Area of Science:

Computer Vision
Biomedical Engineering
Speech Recognition

Background:

Traditional speech recognition relies heavily on acoustic data, limiting its use in noisy environments or for individuals with speech impairments.
Visual speech information, particularly lip movements, offers a complementary modality for speech recognition.
Developing robust lipreading systems is crucial for advancing human-computer interaction and assistive technologies.

Purpose of the Study:

To design and implement a novel lipreading system capable of recognizing isolated words using solely visual information from lip movements.
To evaluate the system's performance and accuracy in a controlled setting without acoustic data.
To explore the efficacy of combining advanced image processing and machine learning techniques for visual speech recognition.

Related Experiment Videos

Main Methods:

Utilized "snakes" for extracting visual features from the geometric space of lip movements.
Applied Karhunen-Loeve transform (KLT) to identify principal components within the color eigenspace of lip images.
Employed hidden Markov models (HMMs) for the sequential recognition of combined visual features.

Main Results:

The lipreading system demonstrated high accuracy, achieving 94% recognition rate for ten isolated words.
The system successfully performed word recognition using only color video of human lips, without any acoustic data.
The combination of "snakes," KLT, and HMMs proved effective for visual feature extraction and sequence recognition.

Conclusions:

Lipreading systems utilizing only visual information can achieve high accuracy in recognizing isolated words.
The developed system offers a promising alternative or supplement to acoustic-based speech recognition, particularly in challenging acoustic environments.
This research highlights the potential of advanced computer vision and machine learning techniques for robust visual speech recognition applications.