Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Types Of Transformers

Types Of Transformers

Transformers can provide desired voltages to a circuit by modifying the number of turns in the secondary windings.
If the ratio of the number of turns in the secondary winding to that of the primary winding is greater than one, then the transformer is said to be a step-up transformer. In a step-up transformer, the voltage at the secondary winding is greater than the voltage applied at the primary winding.
However, if this ratio is less than one, the transformer is said to be a step-down...

The Ideal Transformer

The Ideal Transformer

In single-phase two-winding transformers, two windings are coiled around a magnetic core characterized by cross-sectional area A and magnetic permeability μ. A phasor current i1 enters the left winding while i2 exits the right winding, establishing the fundamental working of the transformer through electromagnetic principles.
Ampere's Law forms the basis of understanding the magnetic field within the transformer. It states that the integral of the magnetic field intensity's...

Multi-input and Multi-variable systems

Multi-input and Multi-variable systems

Cruise control systems in cars are designed as multi-input systems to maintain a driver's desired speed while compensating for external disturbances such as changes in terrain. The block diagram for a cruise control system typically includes two main inputs: the desired speed set by the driver and any external disturbances, such as the incline of the road. By adjusting the engine throttle, the system maintains the vehicle's speed as close to the desired value as possible.
In the absence...

Classification of Signals

Classification of Signals

In signal processing, signals are classified based on various characteristics: continuous-time versus discrete-time, periodic versus aperiodic, analog versus digital, and causal versus noncausal. Each category highlights distinct properties crucial for understanding and manipulating signals.
A continuous-time signal holds a value at every instant in time, representing information seamlessly. In contrast, a discrete-time signal holds values only at specific moments, often denoted as x(n), where...

Transformers in Distribution System

Transformers in Distribution System

Transformers in distribution systems can be broadly categorized into distribution substation transformers and other distribution transformers. They are crucial for stepping down high transmission voltages to levels suitable for distribution and end-user applications.
Distribution substation transformers come in various ratings and typically use mineral oil for insulation and cooling. To prevent moisture and air from entering the oil, some transformers use an inert gas like nitrogen to fill the...

Neural Circuits

Neural Circuits

Neural circuits and neuronal pools are two of the main structures found in the nervous system. Neural circuits are networks of neurons that work together to carry out a specific task or process. They consist of interconnected neurons and glial cells, which provide structural and metabolic support.
Neuronal pools are collections of nerve cells with similar functions and interact through chemical and electrical signals. These pools include both interneurons (the central neural circuit nodes that...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Environmental triggers of a Microcystis (Cyanophyceae) bloom in an artificial lagoon of Hangzhou Bay, China.

Marine pollution bulletin·2018

Same author

Quantitative evaluation of retinal artery occlusion using optical coherence tomography angiography: A case report.

Medicine·2018

Same author

Multibandgap quantum dot ensembles for solar-matched infrared energy harvesting.

Nature communications·2018

Same author

Butylamine-Catalyzed Synthesis of Nanocrystal Inks Enables Efficient Infrared CQD Solar Cells.

Advanced materials (Deerfield Beach, Fla.)·2018

Same author

A secretory hexokinase plays an active role in the proliferation of <i>Nosema bombycis</i>.

PeerJ·2018

Same author

Discovery of furyl/thienyl β-carboline derivatives as potent and selective PDE5 inhibitors with excellent vasorelaxant effect.

European journal of medicinal chemistry·2018

Same journal

Hidden Data Recovery and Forecasting via Next-Generation Reservoir Computing With Multiscale Delay Selection.

IEEE transactions on neural networks and learning systems·2026

Same journal

CAFF-CIL: Causality-Aware Freedom Forgetting Approach for Class-Incremental Learning.

IEEE transactions on neural networks and learning systems·2026

Same journal

Harmonic Autoencoding Framework for Multiple Tasks in Magnetic Particle Imaging Reconstruction.

IEEE transactions on neural networks and learning systems·2026

Same journal

A Survey on Human-Centric Voice-Face Multimodal Learning.

IEEE transactions on neural networks and learning systems·2026

Same journal

Vision-Assisted Foundation Model for Solving Multitask Vehicle Routing Problems.

IEEE transactions on neural networks and learning systems·2026

Same journal

FP3O: Enabling Proximal Policy Optimization in Multiagent Cooperation With Parameter-Sharing Versatility.

IEEE transactions on neural networks and learning systems·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Sep 27, 2025

Author Spotlight: Advancing Large-Scale Neural Dynamics Through HD-MEA Technology

Author Spotlight: Advancing Large-Scale Neural Dynamics Through HD-MEA Technology

Published on: March 8, 2024

Multimodal Sparse Transformer Network for Audio-Visual Speech Recognition.

Qiya Song, Bin Sun, Shutao Li

IEEE Transactions on Neural Networks and Learning Systems

|April 12, 2022

Summary

This summary is machine-generated.

This study introduces a multimodal sparse transformer network (MMST) to improve audio-visual speech recognition (AVSR) in noisy environments. The novel approach enhances visual features with motion information, significantly reducing word error rates for more robust speech recognition.

More Related Videos

A Swin Transformer-Based Model for Thyroid Nodule Detection in Ultrasound Images

A Swin Transformer-Based Model for Thyroid Nodule Detection in Ultrasound Images

Published on: April 21, 2023

Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception

Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception

Published on: August 9, 2024

Related Experiment Videos

Last Updated: Sep 27, 2025

Author Spotlight: Advancing Large-Scale Neural Dynamics Through HD-MEA Technology

Author Spotlight: Advancing Large-Scale Neural Dynamics Through HD-MEA Technology

Published on: March 8, 2024

A Swin Transformer-Based Model for Thyroid Nodule Detection in Ultrasound Images

A Swin Transformer-Based Model for Thyroid Nodule Detection in Ultrasound Images

Published on: April 21, 2023

Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception

Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception

Published on: August 9, 2024

Area of Science:

Artificial Intelligence
Computer Vision
Speech Processing

Background:

Automatic speech recognition (ASR) systems face performance degradation in noisy conditions.
Audio-visual speech recognition (AVSR) uses visual cues to enhance ASR, especially in adverse environments.
Transformer architectures show promise in AVSR but struggle with irrelevant information and lack motion feature integration.

Purpose of the Study:

To propose a novel multimodal sparse transformer network (MMST) for enhanced AVSR.
To address limitations of existing transformer models in handling long-term dependencies and irrelevant information.
To incorporate essential motion features into AVSR for improved spatio-temporal visual information utilization.

Main Methods:

Developed a multimodal sparse transformer network (MMST) incorporating sparse self-attention.
Integrated motion features into the visual modality processing.
Utilized a cross-modal attention module for seamless information flow between motion and visual modalities.

Main Results:

The MMST model demonstrated improved attention concentration on relevant global information.
Integration of motion features enhanced visual feature representation.
Experiments showed significant performance improvements over state-of-the-art methods on various datasets.
Reduced word error rate (WER) was achieved, indicating superior recognition accuracy.

Conclusions:

The proposed MMST effectively enhances audio-visual speech recognition performance, particularly in noisy conditions.
Incorporating motion features and sparse attention mechanisms are crucial for robust AVSR.
The MMST offers a promising direction for developing more reliable human-machine interfaces.