Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Experiment Videos

Evaluating deep learning architectures for Speech Emotion Recognition.

Haytham M Fayek1, Margaret Lech1, Lawrence Cavedon2

  • 1School of Engineering, RMIT University, Melbourne VIC 3001, Australia.

Neural Networks : the Official Journal of the International Neural Network Society
|April 12, 2017
PubMed
Summary
This summary is machine-generated.

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Zero-Shot Neural Network Evaluation with Sample-Wise Activation Patterns.

IEEE transactions on pattern analysis and machine intelligence·2026
Same author

Assessing the Impact of the Quality of Textual Data on Feature Representation and Machine Learning Models: Quantitative Study Using Large Language Models.

Journal of medical Internet research·2025
Same author

Automated Detection of Invasive Fungal Infections in Clinical Reports Using Medical Language Models.

Studies in health technology and informatics·2025
Same author

Co-designing an online treatment decision aid for men with low-risk prostate cancer: Navigate.

BJUI compass·2024
Same author

Deep Reinforcement Learning for Articulatory Synthesis in a Vowel-to-Vowel Imitation Task.

Sensors (Basel, Switzerland)·2023
Same author

Simultaneous Sleep Stage and Sleep Disorder Detection from Multimodal Sensors Using Deep Learning.

Sensors (Basel, Switzerland)·2023
Same journal

Exploiting audio-visual modalities in videos: Object detection via multi-stage bilateral coupling network.

Neural networks : the official journal of the International Neural Network Society·2026
Same journal

Reliability-aware modality completion with cross-modal distillation for federated learning with missing modalities.

Neural networks : the official journal of the International Neural Network Society·2026
Same journal

IGFD-Net: Illumination-guided frequency decoupling for polarization image fusion.

Neural networks : the official journal of the International Neural Network Society·2026
Same journal

Multiple-Strategies dung beetle optimizer and its applications in engineering optimization and bankruptcy prediction.

Neural networks : the official journal of the International Neural Network Society·2026
Same journal

Aggregating global-scale pixel-wise forgery cues within a graph.

Neural networks : the official journal of the International Neural Network Society·2026
Same journal

Finite-Time intermittent control for secure synchronization of Neutral-Type stochastic delayed neural networks under aperiodic DoS attacks.

Neural networks : the official journal of the International Neural Network Society·2026
See all related articles

This study explores deep learning for speech emotion recognition (SER), achieving state-of-the-art results on the IEMOCAP database. The frame-based approach effectively models speech dynamics for improved emotion detection.

Area of Science:

  • Computational Linguistics
  • Artificial Intelligence
  • Machine Learning

Background:

  • Speech Emotion Recognition (SER) is crucial for human-computer interaction.
  • SER can be framed as static or dynamic classification, offering a robust testbed for deep learning.
  • Existing methods often require extensive speech processing.

Purpose of the Study:

  • To investigate and compare various deep learning architectures for SER.
  • To develop a frame-based, end-to-end deep learning system for SER.
  • To model intra-utterance speech dynamics effectively.

Main Methods:

  • A frame-based formulation for SER with minimal speech processing.
  • End-to-end deep learning models, including feed-forward and recurrent neural networks.
Keywords:
Affective computingDeep learningEmotion recognitionNeural networksSpeech recognition

Related Experiment Videos

  • Empirical exploration and comparison of different neural network architectures.
  • Main Results:

    • Achieved state-of-the-art speaker-independent SER results on the IEMOCAP database.
    • Demonstrated the advantages and limitations of feed-forward and recurrent networks in SER.
    • Provided quantitative and qualitative performance assessments of the models.

    Conclusions:

    • The proposed frame-based, end-to-end deep learning system is effective for SER.
    • Deep learning architectures, particularly RNNs, show strong potential for modeling speech dynamics in emotion recognition.
    • The findings advance the field of paralinguistic speech analysis.