Selective Acoustic Feature Enhancement for Speech Emotion Recognition With Noisy Speech
View abstract on PubMed
Summary
This summary is machine-generated.This study introduces a novel speech enhancement method for speech emotion recognition (SER) systems. By selectively enhancing only weak acoustic features, the proposed approach significantly improves emotion recognition performance in noisy conditions.
Area Of Science
- Speech processing
- Machine learning
- Acoustic analysis
Background
- Real-world speech emotion recognition (SER) systems face challenges with background noise.
- Speech enhancement (SE) modules can improve speech quality but may degrade crucial SER features.
- Existing SE methods risk altering robust acoustic features essential for accurate emotion recognition.
Purpose Of The Study
- To develop a targeted speech enhancement strategy for SER systems operating in noisy environments.
- To enhance only the weak acoustic features that negatively impact emotion recognition performance.
- To preserve robust features that are resilient to environmental variations.
Main Methods
- Identified weak features using multiple single-feature acoustic models trained on clean speech.
- Ranked features based on performance, robustness, and a combined joint rank.
- Selectively enhanced identified weak low-level descriptors (LLDs), preserving robust features.
Main Results
- Directly enhancing weak LLDs outperformed extracting LLDs from fully enhanced speech.
- Achieved significant performance gains: 17.7% (arousal), 21.2% (dominance), and 3.3% (valence) at 10dB SNR.
- Outperformed a system that enhanced all LLDs on the MSP-Podcast corpus.
Conclusions
- Targeted enhancement of weak features is a more effective strategy for SER in noisy conditions.
- The proposed method preserves discriminative acoustic information crucial for robust emotion recognition.
- This approach offers substantial improvements in SER accuracy across various emotional dimensions.

