UrduSER: A comprehensive dataset for speech emotion recognition in Urdu language
View abstract on PubMed
Summary
This summary is machine-generated.A new Urdu Speech Emotion Recognition Dataset (UrduSER) was created to address the lack of resources for analyzing emotions in Urdu speech. This dataset features diverse, real-world dialogues from professional actors, enhancing machine learning models for Urdu SER.
Area Of Science
- Speech Emotion Recognition (SER)
- Computational Linguistics
- Machine Learning
Background
- Speech Emotion Recognition (SER) is vital for understanding human-computer interaction and has significant socio-cultural and business applications.
- Existing datasets for Urdu, the 10th most spoken language, are limited in scope, emotion range, and dialogue diversity, hindering real-world SER applications.
- A significant research gap exists in Urdu SER due to the scarcity of comprehensive and diverse speech emotion datasets.
Purpose Of The Study
- To develop a comprehensive and balanced Speech Emotion Recognition Dataset for the Urdu language (UrduSER).
- To address the limitations of existing Urdu SER datasets by incorporating diverse, real-world dialogues and a wider range of emotions.
- To facilitate advancements in machine learning and deep learning models for Urdu speech emotion analysis.
Main Methods
- Collected 3500 speech signals from 10 professional Pakistani actors (balanced gender and age) from YouTube drama serials and telefilms.
- Included seven distinct emotional states: Angry, Fear, Boredom, Disgust, Happy, Neutral, and Sad, with 500 samples per emotion.
- Ensured dialogue diversity with unique content per utterance and provided detailed metadata, including scripts, for each audio sample.
Main Results
- Developed the UrduSER dataset, a comprehensive resource featuring 3500 diverse speech signals.
- The dataset includes a balanced distribution of 500 samples per emotion and 50 samples per actor per emotion.
- Expert validation confirmed the dataset's validity, reliability, and suitability for research and development.
Conclusions
- The UrduSER dataset effectively fills the critical research gap for Urdu speech emotion recognition.
- Its diverse, real-world nature and comprehensive metadata enhance its utility for training robust SER models.
- This resource is expected to significantly advance research and development in Urdu SER, enabling more accurate and nuanced emotion detection in spoken language.
Related Concept Videos
In signal processing, signals are classified based on various characteristics: continuous-time versus discrete-time, periodic versus aperiodic, analog versus digital, and causal versus noncausal. Each category highlights distinct properties crucial for understanding and manipulating signals.
A continuous-time signal holds a value at every instant in time, representing information seamlessly. In contrast, a discrete-time signal holds values only at specific moments, often denoted as x(n), where...
Heart sounds are generated by the turbulence in blood flow due to the closing of heart valves. These sounds are best perceived slightly away from the valves, where the blood flow disseminates the sound.
Auscultation is the process of listening to these internal body sounds using a stethoscope. The heart produces four types of sounds, but only two—S1 and S2—can usually be heard with a stethoscope.
S1, also known as the "lub" sound, is caused by the closure of atrioventricular (A-V)...

