ET-Network: A novel efficient transformer deep learning model for automated Urdu handwritten text recognition
View abstract on PubMed
Summary
This summary is machine-generated.ET-Network, a novel transformer-based method, significantly improves Urdu handwritten text recognition by integrating self-attention with EfficientNet. This advancement establishes a new state-of-the-art for character and word error rates in low-resource language optical character recognition.
Area Of Science
- Computer Science
- Artificial Intelligence
- Natural Language Processing
Background
- Urdu handwritten text recognition (UHTR) is challenging due to inconsistent writing styles and limited data, unlike high-resource languages.
- Existing optical character recognition (OCR) methods struggle with the complexities of Urdu's cursive script and data scarcity.
- Transformer models show promise for UHTR, addressing limitations of traditional approaches.
Purpose Of The Study
- To develop an advanced transformer-based method for accurate Urdu handwritten text recognition.
- To enhance feature extraction by integrating self-attention mechanisms into EfficientNet.
- To establish a new state-of-the-art performance benchmark for UHTR.
Main Methods
- Proposed ET-Network, combining EfficientNet with self-attention for robust feature extraction.
- Utilized a vanilla transformer architecture for language modeling and text generation.
- Employed prefix beam search for optimizing recognition outcomes.
- Trained and evaluated the model on three diverse Urdu handwritten datasets: NUST-UHWR, UPTI2.0, and MMU-OCR-21.
Main Results
- Achieved a 4% reduction in character error rate (CER) and a 1.55% reduction in word error rate (WER).
- Established a new state-of-the-art CER of 5.27% and WER of 19.09% for Urdu handwritten text recognition.
- Demonstrated the effectiveness of integrating self-attention for capturing long-range dependencies in Urdu script.
Conclusions
- The ET-Network offers a significant improvement over existing methods for Urdu handwritten text recognition.
- Transformer-based approaches, particularly with self-attention, are highly effective for low-resource language OCR.
- The established state-of-the-art performance highlights the potential of ET-Network for practical applications.

