Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Search research articles

Related Experiment Videos

Evaluating deep learning architectures for Speech Emotion Recognition.

Haytham M Fayek¹, Margaret Lech¹, Lawrence Cavedon²

¹School of Engineering, RMIT University, Melbourne VIC 3001, Australia.

Neural Networks : the Official Journal of the International Neural Network Society

|April 12, 2017

Summary

This summary is machine-generated.

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Zero-Shot Neural Network Evaluation with Sample-Wise Activation Patterns.

IEEE transactions on pattern analysis and machine intelligence·2026

Same author

Assessing the Impact of the Quality of Textual Data on Feature Representation and Machine Learning Models: Quantitative Study Using Large Language Models.

Journal of medical Internet research·2025

Same author

Automated Detection of Invasive Fungal Infections in Clinical Reports Using Medical Language Models.

Studies in health technology and informatics·2025

Same author

Co-designing an online treatment decision aid for men with low-risk prostate cancer: Navigate.

BJUI compass·2024

Same author

Deep Reinforcement Learning for Articulatory Synthesis in a Vowel-to-Vowel Imitation Task.

Sensors (Basel, Switzerland)·2023

Same author

Simultaneous Sleep Stage and Sleep Disorder Detection from Multimodal Sensors Using Deep Learning.

Sensors (Basel, Switzerland)·2023

Same journal

Exploiting audio-visual modalities in videos: Object detection via multi-stage bilateral coupling network.

Neural networks : the official journal of the International Neural Network Society·2026

Same journal

Reliability-aware modality completion with cross-modal distillation for federated learning with missing modalities.

Neural networks : the official journal of the International Neural Network Society·2026

Same journal

IGFD-Net: Illumination-guided frequency decoupling for polarization image fusion.

Neural networks : the official journal of the International Neural Network Society·2026

Same journal

Multiple-Strategies dung beetle optimizer and its applications in engineering optimization and bankruptcy prediction.

Neural networks : the official journal of the International Neural Network Society·2026

Same journal

Aggregating global-scale pixel-wise forgery cues within a graph.

Neural networks : the official journal of the International Neural Network Society·2026

Same journal

Finite-Time intermittent control for secure synchronization of Neutral-Type stochastic delayed neural networks under aperiodic DoS attacks.

Neural networks : the official journal of the International Neural Network Society·2026

See all related articles

This study explores deep learning for speech emotion recognition (SER), achieving state-of-the-art results on the IEMOCAP database. The frame-based approach effectively models speech dynamics for improved emotion detection.

Area of Science:

Computational Linguistics
Artificial Intelligence
Machine Learning

Background:

Speech Emotion Recognition (SER) is crucial for human-computer interaction.
SER can be framed as static or dynamic classification, offering a robust testbed for deep learning.
Existing methods often require extensive speech processing.

Purpose of the Study:

To investigate and compare various deep learning architectures for SER.
To develop a frame-based, end-to-end deep learning system for SER.
To model intra-utterance speech dynamics effectively.

Main Methods:

A frame-based formulation for SER with minimal speech processing.
End-to-end deep learning models, including feed-forward and recurrent neural networks.

Keywords:

Affective computing Deep learning Emotion recognition Neural networks Speech recognition

Related Experiment Videos

Empirical exploration and comparison of different neural network architectures.

Main Results:

Achieved state-of-the-art speaker-independent SER results on the IEMOCAP database.
Demonstrated the advantages and limitations of feed-forward and recurrent networks in SER.
Provided quantitative and qualitative performance assessments of the models.

Conclusions:

The proposed frame-based, end-to-end deep learning system is effective for SER.
Deep learning architectures, particularly RNNs, show strong potential for modeling speech dynamics in emotion recognition.
The findings advance the field of paralinguistic speech analysis.