Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Labeling Emotion

Labeling Emotion

Emotional labeling is a cognitive process that involves identifying and naming one's emotions, such as anger, fear, happiness, or sadness. It allows individuals to recognize and express their internal emotional states, a critical aspect of emotional regulation and communication. Labeling emotions requires more than mere recognition; it also involves drawing upon memory and contextual cues to understand the current situation and apply a corresponding emotional label. For instance, feeling...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

A lightweight deep learning architecture for automatic shrimp disease classification.

Scientific reports·2026

Same author

Brønsted Acid-Catalyzed Direct Dehydroxylative Allylation of Benzylic Alcohols with Allylsilanes in 1,1,1,3,3,3-Hexafluoro-2-propanol.

Chemical & pharmaceutical bulletin·2026

Same author

Visualization and quantification of RANK-RANKL binding for application to disease investigations and drug discovery.

Bone·2025

Same author

Three-dimensional wall-thickness distributions of unruptured intracranial aneurysms characterized by micro-computed tomography.

Biomechanics and modeling in mechanobiology·2024

Same author

Accessory fragment migration in a professional baseball player with bipartite patella: A case report.

International journal of surgery case reports·2023

Same author

A New Regression Model for Depression Severity Prediction Based on Correlation among Audio Features Using a Graph Convolutional Neural Network.

Diagnostics (Basel, Switzerland)·2023

Same journal

RETRACTED: Zhang et al. A Novel Framework for Reconstruction and Imaging of Target Scattering Centers via Wide-Angle Incidence in Radar Networks. <i>Sensors</i> 2025, <i>25</i>, 6802.

Sensors (Basel, Switzerland)·2026

Same journal

Enhancing Unsupervised Multi-Source Domain Adaptation for Person Re-Identification via Mixture of Experts and Graph-Based Relation.

Sensors (Basel, Switzerland)·2026

Same journal

Development of an Instrumented Glove for Palmar Pressure Assessment in Kayakers.

Sensors (Basel, Switzerland)·2026

Same journal

Development and Experimental Validation of an Autonomous IoT-Based Monitoring System for Real-Time Water Quality Assessment in the Amazon River.

Sensors (Basel, Switzerland)·2026

Same journal

Semi-Supervised Adversarial Learning Framework for Controller Area Network Bus Intrusion Detection.

Sensors (Basel, Switzerland)·2026

Same journal

Smart Optimization Method for Safety Signs in Innovative Manufacturing Environments Integrating Industrial Field IoT Sensors and Knowledge Graphs.

Sensors (Basel, Switzerland)·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Aug 10, 2025

Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception

Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception

Published on: August 9, 2024

Multi-Input Speech Emotion Recognition Model Using Mel Spectrogram and GeMAPS.

Itsuki Toyoshima¹, Yoshifumi Okada², Momoko Ishimaru¹

¹Division of Information and Electronic Engineering, Muroran Institute of Technology, 27-1, Mizumoto-cho, Muroran 050-8585, Hokkaido, Japan.

Sensors (Basel, Switzerland)

|February 11, 2023

Summary

This summary is machine-generated.

This study introduces a novel speech emotion recognition model that combines mel spectrograms and GeMAPS features using a multi-input deep neural network. The model achieves state-of-the-art accuracy, particularly improving recognition of "happiness".

Keywords:

GeMAPS focal loss function mel spectrogram multi-input deep neural network speech emotion recognition

More Related Videos

Author Spotlight: Advancing Large-Scale Neural Dynamics Through HD-MEA Technology

Author Spotlight: Advancing Large-Scale Neural Dynamics Through HD-MEA Technology

Published on: March 8, 2024

Capturing Dynamic Finger Gesturing with High-resolution Surface Electromyography and Computer Vision

Capturing Dynamic Finger Gesturing with High-resolution Surface Electromyography and Computer Vision

Published on: March 28, 2025

Related Experiment Videos

Last Updated: Aug 10, 2025

Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception

Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception

Published on: August 9, 2024

Author Spotlight: Advancing Large-Scale Neural Dynamics Through HD-MEA Technology

Author Spotlight: Advancing Large-Scale Neural Dynamics Through HD-MEA Technology

Published on: March 8, 2024

Capturing Dynamic Finger Gesturing with High-resolution Surface Electromyography and Computer Vision

Capturing Dynamic Finger Gesturing with High-resolution Surface Electromyography and Computer Vision

Published on: March 28, 2025

Area of Science:

Speech processing
Artificial intelligence
Machine learning

Background:

Current emotion recognition models utilize either mel spectrograms (MelSpec) for time-series frequency data or Geneva minimalistic acoustic parameter sets (GeMAPS) for multiple audio features, but not both.
MelSpec captures temporal variations but struggles with diverse features, while GeMAPS handles multiple features but lacks temporal insights, creating a gap in comprehensive audio feature learning.

Purpose of the Study:

To develop an advanced speech emotion recognition (SER) model by integrating MelSpec and GeMAPS features.
To address the limitations of existing methods by creating a model that leverages both the time-series and multi-feature aspects of audio data.
To improve the accuracy of SER, especially for challenging emotions like happiness, by utilizing a novel multi-input deep neural network architecture.

Main Methods:

A multi-input deep neural network was designed to process MelSpec in image format and GeMAPS in vector format concurrently.
The model integrates features learned from both MelSpec and GeMAPS to predict emotions.
A focal loss function was incorporated to mitigate the issue of imbalanced emotion class distribution.

Main Results:

The proposed model achieved weighted accuracy of 0.6657 and unweighted accuracy of 0.6149, outperforming or matching existing state-of-the-art methods.
Significant improvements were observed in recognizing the emotion
happiness
which has historically been difficult due to data limitations.
The model demonstrated robust performance in speech emotion classification.

Conclusions:

The developed multi-input deep neural network effectively integrates diverse acoustic features for enhanced speech emotion recognition.
The model shows promise for practical applications in SER, particularly in improving the identification of subtle or underrepresented emotions.
Future development can further refine the model for broader real-world deployment.