Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Labeling Emotion01:20

Labeling Emotion

211
Emotional labeling is a cognitive process that involves identifying and naming one's emotions, such as anger, fear, happiness, or sadness. It allows individuals to recognize and express their internal emotional states, a critical aspect of emotional regulation and communication. Labeling emotions requires more than mere recognition; it also involves drawing upon memory and contextual cues to understand the current situation and apply a corresponding emotional label. For instance, feeling...
211

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

A lightweight deep learning architecture for automatic shrimp disease classification.

Scientific reports·2026
Same author

Brønsted Acid-Catalyzed Direct Dehydroxylative Allylation of Benzylic Alcohols with Allylsilanes in 1,1,1,3,3,3-Hexafluoro-2-propanol.

Chemical & pharmaceutical bulletin·2026
Same author

Visualization and quantification of RANK-RANKL binding for application to disease investigations and drug discovery.

Bone·2025
Same author

Three-dimensional wall-thickness distributions of unruptured intracranial aneurysms characterized by micro-computed tomography.

Biomechanics and modeling in mechanobiology·2024
Same author

Accessory fragment migration in a professional baseball player with bipartite patella: A case report.

International journal of surgery case reports·2023
Same author

A New Regression Model for Depression Severity Prediction Based on Correlation among Audio Features Using a Graph Convolutional Neural Network.

Diagnostics (Basel, Switzerland)·2023
Same journal

RETRACTED: Zhang et al. A Novel Framework for Reconstruction and Imaging of Target Scattering Centers via Wide-Angle Incidence in Radar Networks. <i>Sensors</i> 2025, <i>25</i>, 6802.

Sensors (Basel, Switzerland)·2026
Same journal

Enhancing Unsupervised Multi-Source Domain Adaptation for Person Re-Identification via Mixture of Experts and Graph-Based Relation.

Sensors (Basel, Switzerland)·2026
Same journal

Development of an Instrumented Glove for Palmar Pressure Assessment in Kayakers.

Sensors (Basel, Switzerland)·2026
Same journal

Development and Experimental Validation of an Autonomous IoT-Based Monitoring System for Real-Time Water Quality Assessment in the Amazon River.

Sensors (Basel, Switzerland)·2026
Same journal

Semi-Supervised Adversarial Learning Framework for Controller Area Network Bus Intrusion Detection.

Sensors (Basel, Switzerland)·2026
Same journal

Smart Optimization Method for Safety Signs in Innovative Manufacturing Environments Integrating Industrial Field IoT Sensors and Knowledge Graphs.

Sensors (Basel, Switzerland)·2026
See all related articles

Related Experiment Video

Updated: Aug 10, 2025

Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception
05:48

Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception

Published on: August 9, 2024

1.6K

Multi-Input Speech Emotion Recognition Model Using Mel Spectrogram and GeMAPS.

Itsuki Toyoshima1, Yoshifumi Okada2, Momoko Ishimaru1

  • 1Division of Information and Electronic Engineering, Muroran Institute of Technology, 27-1, Mizumoto-cho, Muroran 050-8585, Hokkaido, Japan.

Sensors (Basel, Switzerland)
|February 11, 2023
PubMed
Summary
This summary is machine-generated.

This study introduces a novel speech emotion recognition model that combines mel spectrograms and GeMAPS features using a multi-input deep neural network. The model achieves state-of-the-art accuracy, particularly improving recognition of "happiness".

Keywords:
GeMAPSfocal loss functionmel spectrogrammulti-input deep neural networkspeech emotion recognition

More Related Videos

Author Spotlight: Advancing Large-Scale Neural Dynamics Through HD-MEA Technology
09:44

Author Spotlight: Advancing Large-Scale Neural Dynamics Through HD-MEA Technology

Published on: March 8, 2024

4.9K
Capturing Dynamic Finger Gesturing with High-resolution Surface Electromyography and Computer Vision
08:15

Capturing Dynamic Finger Gesturing with High-resolution Surface Electromyography and Computer Vision

Published on: March 28, 2025

688

Related Experiment Videos

Last Updated: Aug 10, 2025

Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception
05:48

Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception

Published on: August 9, 2024

1.6K
Author Spotlight: Advancing Large-Scale Neural Dynamics Through HD-MEA Technology
09:44

Author Spotlight: Advancing Large-Scale Neural Dynamics Through HD-MEA Technology

Published on: March 8, 2024

4.9K
Capturing Dynamic Finger Gesturing with High-resolution Surface Electromyography and Computer Vision
08:15

Capturing Dynamic Finger Gesturing with High-resolution Surface Electromyography and Computer Vision

Published on: March 28, 2025

688

Area of Science:

  • Speech processing
  • Artificial intelligence
  • Machine learning

Background:

  • Current emotion recognition models utilize either mel spectrograms (MelSpec) for time-series frequency data or Geneva minimalistic acoustic parameter sets (GeMAPS) for multiple audio features, but not both.
  • MelSpec captures temporal variations but struggles with diverse features, while GeMAPS handles multiple features but lacks temporal insights, creating a gap in comprehensive audio feature learning.

Purpose of the Study:

  • To develop an advanced speech emotion recognition (SER) model by integrating MelSpec and GeMAPS features.
  • To address the limitations of existing methods by creating a model that leverages both the time-series and multi-feature aspects of audio data.
  • To improve the accuracy of SER, especially for challenging emotions like happiness, by utilizing a novel multi-input deep neural network architecture.

Main Methods:

  • A multi-input deep neural network was designed to process MelSpec in image format and GeMAPS in vector format concurrently.
  • The model integrates features learned from both MelSpec and GeMAPS to predict emotions.
  • A focal loss function was incorporated to mitigate the issue of imbalanced emotion class distribution.

Main Results:

  • The proposed model achieved weighted accuracy of 0.6657 and unweighted accuracy of 0.6149, outperforming or matching existing state-of-the-art methods.
  • Significant improvements were observed in recognizing the emotion
  • happiness
  • which has historically been difficult due to data limitations.
  • The model demonstrated robust performance in speech emotion classification.

Conclusions:

  • The developed multi-input deep neural network effectively integrates diverse acoustic features for enhanced speech emotion recognition.
  • The model shows promise for practical applications in SER, particularly in improving the identification of subtle or underrepresented emotions.
  • Future development can further refine the model for broader real-world deployment.