Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Experiment Videos

A CTC-Based Speech Recognition Network Fusing Local Convolution and Global Attention.

Huijuan Hu1, Chenyang Tang1, Ping Tan2

  • 1School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing 210023, China.

Sensors (Basel, Switzerland)
|March 28, 2026
PubMed
Summary
This summary is machine-generated.

Related Concept Videos

Association Areas of the Cortex01:21

Association Areas of the Cortex

10.5K
Association areas are regions of the cerebral cortex that do not have a specific sensory or motor function. Instead, they integrate and interpret information from various sources to enable higher cognitive processes such as memory, learning, and decision-making. Some key association areas include the following:
Prefrontal Association Area: This area is located in the frontal lobe and is involved in planning, decision-making, and moderating social behavior. It connects with primary motor areas,...
10.5K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Unraveling the intricate link: gut microbiota and recurrent spontaneous abortion.

Frontiers in reproductive health·2026
Same author

Correction: TANK shapes an immunosuppressive microenvironment and predicts prognosis and therapeutic response in glioma.

Frontiers in immunology·2026
Same author

Second victim experiences and perceived support among newly registered nurses following patient safety incidents: a cross-sectional study.

Frontiers in public health·2026
Same author

Antibody-Drug Conjugates for Locally Advanced and Metastatic Urothelial Carcinoma: A Systematic Review and Meta-Analysis.

JAMA network open·2026
Same author

Changes in knowledge and attitude after community-based first-aid training: a prospective study with a 12-month follow-up.

World journal of emergency medicine·2026
Same author

Self-assembled EDIII/NS1 nanoparticle vaccines elicit protective immune responses against Japanese encephalitis virus.

Antiviral research·2026
Same journal

RETRACTED: Zhang et al. A Novel Framework for Reconstruction and Imaging of Target Scattering Centers via Wide-Angle Incidence in Radar Networks. <i>Sensors</i> 2025, <i>25</i>, 6802.

Sensors (Basel, Switzerland)·2026
Same journal

Enhancing Unsupervised Multi-Source Domain Adaptation for Person Re-Identification via Mixture of Experts and Graph-Based Relation.

Sensors (Basel, Switzerland)·2026
Same journal

Development of an Instrumented Glove for Palmar Pressure Assessment in Kayakers.

Sensors (Basel, Switzerland)·2026
Same journal

Development and Experimental Validation of an Autonomous IoT-Based Monitoring System for Real-Time Water Quality Assessment in the Amazon River.

Sensors (Basel, Switzerland)·2026
Same journal

Semi-Supervised Adversarial Learning Framework for Controller Area Network Bus Intrusion Detection.

Sensors (Basel, Switzerland)·2026
Same journal

Smart Optimization Method for Safety Signs in Innovative Manufacturing Environments Integrating Industrial Field IoT Sensors and Knowledge Graphs.

Sensors (Basel, Switzerland)·2026
See all related articles

This study introduces DBA-wav2vec 2.0, improving automatic speech recognition (ASR) by balancing local and global speech features. The new model reduces character errors and enhances performance in fast-speech scenarios.

Area of Science:

  • Artificial Intelligence
  • Computer Science
  • Speech Processing

Background:

  • Automatic speech recognition (ASR) systems face challenges in balancing global semantic understanding with local acoustic detail.
  • Existing wav2vec 2.0 models integrated with Connectionist Temporal Classification (CTC) often struggle with this trade-off.
  • This limits their performance, especially in handling variations in speech rate and acoustic conditions.

Purpose of the Study:

  • To propose a novel architecture, DBA-wav2vec 2.0, that effectively manages the trade-off between local and global feature modeling in ASR.
  • To enhance the robustness and accuracy of speech recognition systems by decoupling temporal modeling.
  • To improve the discriminability of local acoustic features and the consistency of global semantic information.

Main Methods:

Keywords:
CTC alignmentautomatic speech recognitiondual-branch architectureinformation fusiontask-aware gatingwav2vec 2.0

Related Experiment Videos

  • Introduced DBA-wav2vec 2.0, an architecture decoupling temporal modeling into parallel local and global streams at the encoder-decoder interface.
  • Utilized depthwise separable convolutions for local acoustic structure capture and retained self-attention for long-range dependencies.
  • Implemented a task-aware gating mechanism to dynamically integrate heterogeneous features based on acoustic input.

Main Results:

  • Achieved relative Character Error Rate (CER) reductions of 6.4% on AISHELL-1 and 7.4% on ST-CMDS compared to a baseline wav2vec 2.0 model.
  • Demonstrated a 15.3% relative improvement in fast-speech scenarios, indicating enhanced robustness against temporal variations.
  • The gating mechanism effectively refined posterior probability distributions, leading to more distinct alignment points.

Conclusions:

  • DBA-wav2vec 2.0 successfully addresses the modeling trade-offs in CTC-based ASR by decoupling temporal modeling.
  • The proposed architecture enhances ASR performance, particularly in challenging conditions like fast speech.
  • Structural adaptation at the decoding interface is a promising direction for improving the robustness of modern ASR systems.