Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Downsampling

Downsampling

When considering a sampled sequence with zero values between sampling instants, one can replace it by taking every N-th value of the sequence. At these integer multiples of N, the original and sampled sequences coincide. This process, known as decimation, involves extracting every N-th sample from a sequence, thereby creating a more efficient sequence.
The Fourier transform of the decimated sequence reveals a combination of scaled and shifted versions of the original spectrum. This...

Upsampling

Upsampling

Managing signal sampling rates is essential in digital signal processing to maintain signal integrity. A decimated signal, characterized by a reduced frequency range due to its lower sampling rate, can be upsampled by inserting zeros between each sample. This upsampling process expands the original spectrum and introduces repeated spectral replicas at intervals dictated by the new Nyquist frequency. To refine this zero-inserted sequence, it is passed through a lowpass filter with a cutoff...

Buffer Effectiveness

Buffer Effectiveness

Buffer solutions do not have an unlimited capacity to keep the pH relatively constant . Instead, the ability of a buffer solution to resist changes in pH relies on the presence of appreciable amounts of its conjugate weak acid-base pair. When enough strong acid or base is added to substantially lower the concentration of either member of the buffer pair, the buffering action within the solution is compromised.
The buffer capacity is the amount of acid or base that can be added to a given volume...

Reconstruction of Signal using Interpolation

Reconstruction of Signal using Interpolation

Signal processing techniques are essential for accurately converting continuous signals to digital formats and vice versa. When a continuous signal is sampled with a period T, the resulting sampled signal exhibits replicas of the original spectrum in the frequency domain, spaced at intervals equal to the sampling frequency. To handle this sampled signal, a zero-order hold method can be applied, which creates a piecewise constant signal by retaining each sample's value until the next...

Pulse amplitude and quality

Pulse amplitude and quality

Pulse amplitude is a crucial indicator of cardiac health because it provides valuable insights into the strength of left ventricular contractions and the overall uniformity of blood circulation within the vasculature. The strength of the pulse is directly related to the force with which the heart contracts and the volume of blood being pumped.
A weak or absent pulse may indicate reduced cardiac output or poor left ventricular contraction, which can be signs of cardiovascular dysfunction or...

Reducing Line Loss

Reducing Line Loss

In a three-phase circuit, line loss is an indicator of energy dissipated as heat due to the resistance of transmission lines. To address this, incorporating transformers into the system—a step-up transformer at the source and a step-down transformer at the load—is a strategic solution. Two three-phase transformers are introduced to improve this.
With a step-up transformer at the source, the voltage is increased, thereby reducing the current in the transmission lines since power loss...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

An Investigation of the Phonetic Variation of the Word-Initial /l/ and /n/ Across Regional Varieties of Mandarin.

Language and speech·2026

Same author

Diffusing caveolin-1 scaffolds regulate mechanosignalling.

Nature cell biology·2026

Same author

HDCluster: High-Degree Graph Clustering for Robust Analysis of Single Molecule Localization Microscopy.

bioRxiv : the preprint server for biology·2025

Same author

nERdy: network analysis of endoplasmic reticulum dynamics.

Communications biology·2025

Same author

The Interaction of Target and Masker Speech in Competing Speech Perception.

Brain sciences·2025

Same author

Physician-in-the-Loop Active Learning in Radiology Artificial Intelligence Workflows: Opportunities, Challenges, and Future Directions.

AJR. American journal of roentgenology·2025

Same journal

Retraction Note: An adaptive speech signal processing for COVID-19 detection using deep learning approach.

International journal of speech technology·2022

Same journal

The perception of emotional cues by children in artificial background noise.

International journal of speech technology·2021

Same journal

An adaptive speech signal processing for COVID-19 detection using deep learning approach.

International journal of speech technology·2021

Same journal

A novel stochastic deep resilient network for effective speech recognition.

International journal of speech technology·2021

Same journal

RETRACTED ARTICLE: AI driven feature extraction model for chest cavity spectrum signal visualization.

International journal of speech technology·2021

Same journal

Public opinion mining using natural language processing technique for improvisation towards smart city.

International journal of speech technology·2020

See all related articles

Search research articles

Related Experiment Video

Updated: Aug 4, 2025

Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception

Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception

Published on: August 9, 2024

Plain-to-clear speech video conversion for enhanced intelligibility.

Shubam Sachdeva¹, Haoyao Ruan¹, Ghassan Hamarneh²

¹Language and Brain Lab, Department of Linguistics, Simon Fraser University, Burnaby, BC Canada.

International Journal of Speech Technology

|April 3, 2023

Summary

This summary is machine-generated.

Researchers enhanced visual speech cues in videos to improve speech intelligibility. Modifying plain speech videos with clear speech features boosted AI lip-reading accuracy and shows potential for human training.

Keywords:

AI lip reading Intelligibility Speech enhancement Speech style Video speech synthesis

More Related Videos

Systematic Hearing Performance Evaluation Process for Adolescents with Cochlear Implantation at Early Ages

Systematic Hearing Performance Evaluation Process for Adolescents with Cochlear Implantation at Early Ages

Published on: March 24, 2023

A Protocol for Comprehensive Assessment of Bulbar Dysfunction in Amyotrophic Lateral Sclerosis ALS

A Protocol for Comprehensive Assessment of Bulbar Dysfunction in Amyotrophic Lateral Sclerosis ALS

Published on: February 21, 2011

Related Experiment Videos

Last Updated: Aug 4, 2025

Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception

Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception

Published on: August 9, 2024

Systematic Hearing Performance Evaluation Process for Adolescents with Cochlear Implantation at Early Ages

Systematic Hearing Performance Evaluation Process for Adolescents with Cochlear Implantation at Early Ages

Published on: March 24, 2023

A Protocol for Comprehensive Assessment of Bulbar Dysfunction in Amyotrophic Lateral Sclerosis ALS

A Protocol for Comprehensive Assessment of Bulbar Dysfunction in Amyotrophic Lateral Sclerosis ALS

Published on: February 21, 2011

Area of Science:

Speech processing and computer vision
Human-computer interaction
Audiology and speech-language pathology

Background:

Clearly articulated speech demonstrably improves intelligibility compared to plain speech.
Visual speech cues in video-only formats are crucial for speech perception, especially in noisy environments.
Systematic modification of visual speech features to enhance intelligibility remains an underexplored area.

Purpose of the Study:

To investigate the systematic modification of visual speech cues to enhance clear-speech features.
To improve speech intelligibility using synthesized clear-speech videos derived from plain speech.
To evaluate the effectiveness of these synthesized videos using both AI lip-reading and human intelligibility tests.

Main Methods:

Extraction of clear-speech visual features from videos of English words with varying vowels.
Application of extracted features to plain speech videos using an image-warping technique with a 'displacement factor'.
Synthesis of novel clear-speech videos and evaluation via state-of-the-art AI lip readers and human participants.

Main Results:

Successfully extracted and applied visual cues to enhance speech intelligibility for AI lip readers.
Demonstrated that universal, talker-independent clear-speech features can modify visual speech styles.
Introduced the 'displacement factor' for quantifiable scaling of visual modifications between speech styles.

Conclusions:

Systematic enhancement of visual speech features can significantly improve AI-based speech intelligibility.
The findings suggest the potential for talker-independent visual speech modification techniques.
Generated high-definition videos are suitable for future human-centric intelligibility and perceptual training studies.