Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Auditory Perception

Auditory Perception

The auditory system is essential for sound perception, utilizing various critical structures. When sound waves enter the outer ear, they travel through the ear canal and cause the eardrum to vibrate. These vibrations are then transmitted to the middle ear, where three tiny bones – the malleus, incus, and stapes – amplify the sound. This amplification is crucial, as it ensures that the sound vibrations are strong enough to be conveyed to the inner ear. These vibrations then reach the...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Generalisable artificial intelligence ECG trained on public data for outcome prediction after transcatheter aortic valve replacement.

Heart (British Cardiac Society)·2026

Same author

ATEdrug: A reliable human-in-the-loop annotation scheme for aspect term extraction and polarity detection in drug reviews.

PloS one·2026

Same author

Next-generation digital twin model with unobtrusive RF multi-sensing for AI-based human monitoring.

Scientific reports·2026

Same author

The First JenaValve Trilogy System Transcatheter Aortic Valve Replacement for Pure Severe Native Aortic Valve Regurgitation in Taiwan: A Case Report.

Acta Cardiologica Sinica·2026

Same author

Deep learning analysis for enhanced prediction of heat transfer in Maxwell hybrid nanofluids with non-Fourier law and radiation effects.

Scientific reports·2026

Same author

A novel deep semantic- and vision-based self-attention architecture for skin cancer classification.

Digital health·2026

Same journal

Magnetic Resonance Spectroscopy Deep Learning with Magnetic Resonance Background Generator Enables In Vivo Metabolite Quantification of Hepatic Encephalopathy.

IEEE transactions on bio-medical engineering·2026

Same journal

Use of RPNIs and Implanted Electrodes for Prosthetic Wrist and Multi-Grip Hand Control during Functional Tasks: A Case Study.

IEEE transactions on bio-medical engineering·2026

Same journal

Healthy Limb Driven Prediction for Real Time Control of Unilateral Exoskeletons in Gait Rehabilitation.

IEEE transactions on bio-medical engineering·2026

Same journal

A Miniature Wearable Ultrasound System for Continuous Bladder Monitoring with Sleeping-Position-Robust Modeling Strategies.

IEEE transactions on bio-medical engineering·2026

Same journal

A Bi-objective Array Optimization Framework for Magnetocardiographic Source Imaging.

IEEE transactions on bio-medical engineering·2026

Same journal

A Dynamic Mutual Information Measure of Phase-Amplitude Coupling with Uncertainty Quantification.

IEEE transactions on bio-medical engineering·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Jan 16, 2026

Systematic Hearing Performance Evaluation Process for Adolescents with Cochlear Implantation at Early Ages

Systematic Hearing Performance Evaluation Process for Adolescents with Cochlear Implantation at Early Ages

Published on: March 24, 2023

Leveraging Self-Supervised Audio-Visual Pretrained Models to Improve Vocoded Speech Intelligibility in Cochlear

Richard Lee Lai, Jen-Cheng Hou, I-Chun Chern

IEEE Transactions on Bio-Medical Engineering

|October 2, 2025

Summary

This summary is machine-generated.

This study introduces Self-Supervised Learning-based Audio-Visual Speech Enhancement (SSL-AVSE) to improve speech understanding for individuals with hearing impairments using cochlear implant simulations. SSL-AVSE significantly enhances speech quality and intelligibility by integrating visual cues.

More Related Videos

Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception

Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception

Published on: August 9, 2024

Synthetic, Multi-Layer, Self-Oscillating Vocal Fold Model Fabrication

Synthetic, Multi-Layer, Self-Oscillating Vocal Fold Model Fabrication

Published on: December 2, 2011

Related Experiment Videos

Last Updated: Jan 16, 2026

Systematic Hearing Performance Evaluation Process for Adolescents with Cochlear Implantation at Early Ages

Systematic Hearing Performance Evaluation Process for Adolescents with Cochlear Implantation at Early Ages

Published on: March 24, 2023

Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception

Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception

Published on: August 9, 2024

Synthetic, Multi-Layer, Self-Oscillating Vocal Fold Model Fabrication

Synthetic, Multi-Layer, Self-Oscillating Vocal Fold Model Fabrication

Published on: December 2, 2011

Area of Science:

Audiology
Speech Processing
Machine Learning

Background:

Hearing impairments present significant challenges in speech comprehension, especially in noisy environments.
Cochlear implants (CIs) aim to restore hearing but can struggle with speech intelligibility, particularly with processed (vocoded) speech.
Audio-visual speech enhancement (AVSE) offers a potential solution by leveraging visual cues like lip movements.

Purpose of the Study:

To evaluate the effectiveness of a novel Self-Supervised Learning-based Audio-Visual Speech Enhancement (SSL-AVSE) framework for improving vocoded speech intelligibility in cochlear implant (CI) simulations.
To investigate the performance of SSL-AVSE compared to existing methods.
To assess the cross-lingual generalization capabilities of the proposed model.

Main Methods:

Developed the SSL-AVSE framework, integrating visual speech cues (lip/mouth movements) with audio.
Utilized the AV-HuBERT model for feature extraction and a bidirectional LSTM for refinement.
Conducted experiments on the Taiwan Mandarin Speech with Video (TMSV) dataset.

Main Results:

Objective metrics showed significant improvements: PESQ increased from 1.43 to 1.67, and STOI improved from 0.70 to 0.74.
NCM scores saw an increase of up to 87.2% compared to the noisy baseline.
Subjective listening tests revealed maximum gains of 45.2% in speech quality and 51.9% in word intelligibility.

Conclusions:

SSL-AVSE demonstrates superior performance over AOSE and conventional AVSE baselines in CI simulations.
Statistically significant listening tests confirm the effectiveness of SSL-AVSE.
The model exhibits cross-lingual generalization, performing effectively on Mandarin speech despite English pretraining, highlighting the robustness of foundation model features.