Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Search research articles

Related Experiment Videos

A CTC-Based Speech Recognition Network Fusing Local Convolution and Global Attention.

Huijuan Hu¹, Chenyang Tang¹, Ping Tan²

¹School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing 210023, China.

Sensors (Basel, Switzerland)

|March 28, 2026

Summary

This summary is machine-generated.

Related Concept Videos

Association Areas of the Cortex

Association Areas of the Cortex

Association areas are regions of the cerebral cortex that do not have a specific sensory or motor function. Instead, they integrate and interpret information from various sources to enable higher cognitive processes such as memory, learning, and decision-making. Some key association areas include the following:
Prefrontal Association Area: This area is located in the frontal lobe and is involved in planning, decision-making, and moderating social behavior. It connects with primary motor areas,...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Unraveling the intricate link: gut microbiota and recurrent spontaneous abortion.

Frontiers in reproductive health·2026

Same author

Correction: TANK shapes an immunosuppressive microenvironment and predicts prognosis and therapeutic response in glioma.

Frontiers in immunology·2026

Same author

Second victim experiences and perceived support among newly registered nurses following patient safety incidents: a cross-sectional study.

Frontiers in public health·2026

Same author

Antibody-Drug Conjugates for Locally Advanced and Metastatic Urothelial Carcinoma: A Systematic Review and Meta-Analysis.

JAMA network open·2026

Same author

Changes in knowledge and attitude after community-based first-aid training: a prospective study with a 12-month follow-up.

World journal of emergency medicine·2026

Same author

Self-assembled EDIII/NS1 nanoparticle vaccines elicit protective immune responses against Japanese encephalitis virus.

Antiviral research·2026

Same journal

RETRACTED: Zhang et al. A Novel Framework for Reconstruction and Imaging of Target Scattering Centers via Wide-Angle Incidence in Radar Networks. <i>Sensors</i> 2025, <i>25</i>, 6802.

Sensors (Basel, Switzerland)·2026

Same journal

Enhancing Unsupervised Multi-Source Domain Adaptation for Person Re-Identification via Mixture of Experts and Graph-Based Relation.

Sensors (Basel, Switzerland)·2026

Same journal

Development of an Instrumented Glove for Palmar Pressure Assessment in Kayakers.

Sensors (Basel, Switzerland)·2026

Same journal

Development and Experimental Validation of an Autonomous IoT-Based Monitoring System for Real-Time Water Quality Assessment in the Amazon River.

Sensors (Basel, Switzerland)·2026

Same journal

Semi-Supervised Adversarial Learning Framework for Controller Area Network Bus Intrusion Detection.

Sensors (Basel, Switzerland)·2026

Same journal

Smart Optimization Method for Safety Signs in Innovative Manufacturing Environments Integrating Industrial Field IoT Sensors and Knowledge Graphs.

Sensors (Basel, Switzerland)·2026

See all related articles

This study introduces DBA-wav2vec 2.0, improving automatic speech recognition (ASR) by balancing local and global speech features. The new model reduces character errors and enhances performance in fast-speech scenarios.

Area of Science:

Artificial Intelligence
Computer Science
Speech Processing

Background:

Automatic speech recognition (ASR) systems face challenges in balancing global semantic understanding with local acoustic detail.
Existing wav2vec 2.0 models integrated with Connectionist Temporal Classification (CTC) often struggle with this trade-off.
This limits their performance, especially in handling variations in speech rate and acoustic conditions.

Purpose of the Study:

To propose a novel architecture, DBA-wav2vec 2.0, that effectively manages the trade-off between local and global feature modeling in ASR.
To enhance the robustness and accuracy of speech recognition systems by decoupling temporal modeling.
To improve the discriminability of local acoustic features and the consistency of global semantic information.

Main Methods:

Keywords:

CTC alignment automatic speech recognition dual-branch architecture information fusion task-aware gating wav2vec 2.0

Related Experiment Videos

Introduced DBA-wav2vec 2.0, an architecture decoupling temporal modeling into parallel local and global streams at the encoder-decoder interface.
Utilized depthwise separable convolutions for local acoustic structure capture and retained self-attention for long-range dependencies.
Implemented a task-aware gating mechanism to dynamically integrate heterogeneous features based on acoustic input.

Main Results:

Achieved relative Character Error Rate (CER) reductions of 6.4% on AISHELL-1 and 7.4% on ST-CMDS compared to a baseline wav2vec 2.0 model.
Demonstrated a 15.3% relative improvement in fast-speech scenarios, indicating enhanced robustness against temporal variations.
The gating mechanism effectively refined posterior probability distributions, leading to more distinct alignment points.

Conclusions:

DBA-wav2vec 2.0 successfully addresses the modeling trade-offs in CTC-based ASR by decoupling temporal modeling.
The proposed architecture enhances ASR performance, particularly in challenging conditions like fast speech.
Structural adaptation at the decoding interface is a promising direction for improving the robustness of modern ASR systems.