Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Perceiving Loudness, Pitch, and Location01:21

Perceiving Loudness, Pitch, and Location

395
The human brain perceives pitch through two primary mechanisms reflected in place theory and frequency theory. Each mechanism describes how sound waves are interpreted as specific pitches by the brain, offering insights into the intricate processes of auditory perception.
Place theory, or place coding, suggests that different pitches are heard because various sound waves activate specific locations along the cochlea's basilar membrane. The brain determines the pitch of a sound by...
395
Multi-input and Multi-variable systems01:22

Multi-input and Multi-variable systems

139
Cruise control systems in cars are designed as multi-input systems to maintain a driver's desired speed while compensating for external disturbances such as changes in terrain. The block diagram for a cruise control system typically includes two main inputs: the desired speed set by the driver and any external disturbances, such as the incline of the road. By adjusting the engine throttle, the system maintains the vehicle's speed as close to the desired value as possible.
In the absence...
139
Classification of Signals01:30

Classification of Signals

705
In signal processing, signals are classified based on various characteristics: continuous-time versus discrete-time, periodic versus aperiodic, analog versus digital, and causal versus noncausal. Each category highlights distinct properties crucial for understanding and manipulating signals.
A continuous-time signal holds a value at every instant in time, representing information seamlessly. In contrast, a discrete-time signal holds values only at specific moments, often denoted as x(n), where...
705
Signal and System01:26

Signal and System

916
A signal x(t) is a set of data or a time function representing a variable of interest. Signals typically convey information about a phenomenon, such as atmospheric temperature, humidity, human voice, television images, a dog's bark, or birdsongs. More generally, a signal can be a function of more than one independent variable. For instance, images depend on horizontal and vertical positions and can be regarded as two-dimensional signals. However, this text will focus on one-dimensional...
916
Perception of Sound Waves01:01

Perception of Sound Waves

4.6K
The human ear is not equally sensitive to all frequencies in the audible range. It may perceive sound waves with the same pressure but different frequencies as having different loudness. Moreover, the perception of sound waves depends on the health of an individual's ears, which decays with age. The health of one's ears may also be affected by regular exposure to loud noises.
The pitch of a sound depends on the frequency and the pressure amplitude of the source. Two sounds of the same...
4.6K
Classification of Systems-II01:31

Classification of Systems-II

214
Continuous-time systems have continuous input and output signals, with time measured continuously. These systems are generally defined by differential or algebraic equations. For instance, in an RC circuit, the relationship between input and output voltage is expressed through a differential equation derived from Ohm's law and the capacitor relation,
214

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Hybrid Deep Neural Network Framework Combining Skeleton and Gait Features for Pathological Gait Recognition.

Bioengineering (Basel, Switzerland)·2023
Same author

A Deep Learning-Based Semantic Segmentation Model Using MCNN and Attention Layer for Human Activity Recognition.

Sensors (Basel, Switzerland)·2023
Same author

Deep-Learning-Based ADHD Classification Using Children's Skeleton Data Acquired through the ADHD Screening Game.

Sensors (Basel, Switzerland)·2023
Same author

Deep Learning-Based ADHD and ADHD-RISK Classification Technology through the Recognition of Children's Abnormal Behaviors during the Robot-Led ADHD Screening Game.

Sensors (Basel, Switzerland)·2023
Same author

A Low-Cost Foot-Placed UWB and IMU Fusion-Based Indoor Pedestrian Tracking System for IoT Applications.

Sensors (Basel, Switzerland)·2022
Same author

Markerless 3D Skeleton Tracking Algorithm by Merging Multiple Inaccurate Skeleton Data from Multiple RGB-D Sensors.

Sensors (Basel, Switzerland)·2022
Same journal

RETRACTED: Zhang et al. A Novel Framework for Reconstruction and Imaging of Target Scattering Centers via Wide-Angle Incidence in Radar Networks. <i>Sensors</i> 2025, <i>25</i>, 6802.

Sensors (Basel, Switzerland)·2026
Same journal

Enhancing Unsupervised Multi-Source Domain Adaptation for Person Re-Identification via Mixture of Experts and Graph-Based Relation.

Sensors (Basel, Switzerland)·2026
Same journal

Development of an Instrumented Glove for Palmar Pressure Assessment in Kayakers.

Sensors (Basel, Switzerland)·2026
Same journal

Development and Experimental Validation of an Autonomous IoT-Based Monitoring System for Real-Time Water Quality Assessment in the Amazon River.

Sensors (Basel, Switzerland)·2026
Same journal

Semi-Supervised Adversarial Learning Framework for Controller Area Network Bus Intrusion Detection.

Sensors (Basel, Switzerland)·2026
Same journal

Smart Optimization Method for Safety Signs in Innovative Manufacturing Environments Integrating Industrial Field IoT Sensors and Knowledge Graphs.

Sensors (Basel, Switzerland)·2026
See all related articles

Related Experiment Video

Updated: Aug 23, 2025

Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception
05:48

Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception

Published on: August 9, 2024

1.6K

Noise-Robust Multimodal Audio-Visual Speech Recognition System for Speech-Based Interaction Applications.

Sanghun Jeon1, Mun Sang Kim1

  • 1Center for Healthcare Robotics, Gwangju Institute of Science and Technology (GIST), School of Integrated Technology, Gwangju 61005, Korea.

Sensors (Basel, Switzerland)
|October 27, 2022
PubMed
Summary
This summary is machine-generated.

This study introduces a robust speech recognition system for noisy environments by combining audio and visual data. Multimodal input significantly improves recognition accuracy, making systems more reliable in real-world applications.

Keywords:
audiovisual speech recognitiondeep learningedutainmentlipreadingmultimodal interactionvirtual aquarium

More Related Videos

Asthma Detection Research Based on Voice Signal Processing and Machine Learning
04:04

Asthma Detection Research Based on Voice Signal Processing and Machine Learning

Published on: July 22, 2025

320
A Protocol for Comprehensive Assessment of Bulbar Dysfunction in Amyotrophic Lateral Sclerosis ALS
12:43

A Protocol for Comprehensive Assessment of Bulbar Dysfunction in Amyotrophic Lateral Sclerosis ALS

Published on: February 21, 2011

34.9K

Related Experiment Videos

Last Updated: Aug 23, 2025

Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception
05:48

Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception

Published on: August 9, 2024

1.6K
Asthma Detection Research Based on Voice Signal Processing and Machine Learning
04:04

Asthma Detection Research Based on Voice Signal Processing and Machine Learning

Published on: July 22, 2025

320
A Protocol for Comprehensive Assessment of Bulbar Dysfunction in Amyotrophic Lateral Sclerosis ALS
12:43

A Protocol for Comprehensive Assessment of Bulbar Dysfunction in Amyotrophic Lateral Sclerosis ALS

Published on: February 21, 2011

34.9K

Area of Science:

  • Human-Computer Interaction
  • Artificial Intelligence
  • Signal Processing

Background:

  • Speech recognition is crucial for edutainment systems but struggles in noisy real-world environments.
  • Ambient noise significantly limits the effectiveness of single-mode speech interaction systems.
  • Existing systems lack robustness, hindering seamless user-system interaction in diverse settings.

Purpose of the Study:

  • To develop a noise-robust, multimodal interaction system for virtual aquarium environments using speech.
  • To enhance speech recognition accuracy by integrating audio and visual information.
  • To improve the reliability of speech-based interaction in real-world applications.

Main Methods:

  • Proposed a multimodal system combining audio-based speech recognition (using speech API and pretrained word vectors) and vision-based speech recognition (using a deep neural network).
  • Concatenated vectors from both audio and visual modalities for classification.
  • Evaluated the system's signal-to-noise ratio in four noise environments and compared its accuracy and efficiency against single-mode approaches.

Main Results:

  • The multimodal system achieved an average recognition rate of 98.12%, a 6.7% improvement over speech-only recognition (91.42%).
  • Demonstrated superior accuracy and efficiency compared to existing single-mode visual feature extraction and audio speech recognition methods.
  • The system proved robust across various noise environments.

Conclusions:

  • Combining audio and visual information significantly enhances speech recognition robustness and accuracy in noisy conditions.
  • The proposed multimodal approach offers a viable solution for reliable speech interaction in diverse real-world settings.
  • This technology has broad applicability in public spaces like cafés, museums, and kiosks.