Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Perception of Sound Waves

Perception of Sound Waves

The human ear is not equally sensitive to all frequencies in the audible range. It may perceive sound waves with the same pressure but different frequencies as having different loudness. Moreover, the perception of sound waves depends on the health of an individual's ears, which decays with age. The health of one's ears may also be affected by regular exposure to loud noises.
The pitch of a sound depends on the frequency and the pressure amplitude of the source. Two sounds of the same...

Perceiving Loudness, Pitch, and Location

Perceiving Loudness, Pitch, and Location

The human brain perceives pitch through two primary mechanisms reflected in place theory and frequency theory. Each mechanism describes how sound waves are interpreted as specific pitches by the brain, offering insights into the intricate processes of auditory perception.
Place theory, or place coding, suggests that different pitches are heard because various sound waves activate specific locations along the cochlea's basilar membrane. The brain determines the pitch of a sound by...

Labeling Emotion

Labeling Emotion

Emotional labeling is a cognitive process that involves identifying and naming one's emotions, such as anger, fear, happiness, or sadness. It allows individuals to recognize and express their internal emotional states, a critical aspect of emotional regulation and communication. Labeling emotions requires more than mere recognition; it also involves drawing upon memory and contextual cues to understand the current situation and apply a corresponding emotional label. For instance, feeling...

Non-Verbal Cues

Non-Verbal Cues

Non-verbal communication extends beyond gestures and facial expressions to include vocal elements known as paralanguage. Paralanguage consists of non-verbal vocal cues such as pitch, loudness, speech rate, pauses, and non-verbal vocalizations like laughter, sighs, and moans. These elements not only accompany speech but also provide critical emotional and contextual information.The Role of Paralanguage in CommunicationParalanguage adds depth to spoken language by conveying emotions and...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Multimodal LLM vs. Human-Measured Features for AI Predictions of Autism in Home Videos.

Algorithms·2026

Same author

Ensemble Modeling of Multiple Physical Indicators to Dynamically Phenotype Autism Spectrum Disorder.

Algorithms·2026

Same author

Remote Assessment of Parkinson Disease Using Deep Learning on Structured Mouse-Trace Data From Suspected Cases: Machine-Learning Pilot Feasibility Study.

JMIR formative research·2026

Same author

Correlates of Fitness Tracker Ownership and Use in Cancer Survivors: Cross-Sectional Survey.

JMIR cancer·2026

Same author

Aiding Large Language Models Using Clinical Scoresheets for Neurobehavioral Diagnostic Classification From Text: Algorithm Development and Validation.

JMIR AI·2025

Same author

mHealth technologies in research studying cardiovascular health in cancer: A systematic review.

PLOS digital health·2025

Same journal

Rodent Social Behavior Recognition Using a Global Context-Aware Vision Transformer Network.

AI (Basel, Switzerland)·2026

Same journal

Artificial Intelligence at the Intersection of Chemistry and Materials Science.

AI (Basel, Switzerland)·2026

Same journal

Monitoring Substance Use with Fitbit Biosignals: A Case Study on Training Deep Learning Models Using Ecological Momentary Assessments and Passive Sensing.

AI (Basel, Switzerland)·2025

Same journal

Can Artificial Intelligence Aid Diagnosis by Teleguided Point-of-Care Ultrasound? A Pilot Study for Evaluating a Novel Computer Algorithm for COVID-19 Diagnosis Using Lung Ultrasound.

AI (Basel, Switzerland)·2023

Same journal

Can Sequential Images from the Same Object Be Used for Training Machine Learning Models? A Case Study for Detecting Liver Disease by Ultrasound Radiomics.

AI (Basel, Switzerland)·2022

See all related articles

Search research articles

Related Experiment Video

Updated: May 6, 2026

Conscious and Non-conscious Representations of Emotional Faces in Asperger's Syndrome

Conscious and Non-conscious Representations of Emotional Faces in Asperger's Syndrome

Published on: July 31, 2016

Audio-Based Emotion Recognition Using Self-Supervised Learning on an Engineered Feature Space.

Peranut Nimitsurachat¹, Peter Washington²

¹Institute for Computational and Mathematical Engineering (ICME), Stanford University, Stanford, CA 94305, USA.

AI (Basel, Switzerland)

|May 8, 2024

Summary

This summary is machine-generated.

Self-supervised learning (SSL) enhances audio-based emotion recognition models, especially when labeled data is scarce. This method improves performance by pre-training on acoustic features, proving most effective for easily classified emotions.

Keywords:

emotion classification emotion recognition self-supervised learning transfer learning

More Related Videos

Author Spotlight: Addressing Technical and Subjective Challenges in Measuring Classroom Attention

Author Spotlight: Addressing Technical and Subjective Challenges in Measuring Classroom Attention

Published on: December 15, 2023

Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception

Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception

Published on: August 9, 2024

Related Experiment Videos

Last Updated: May 6, 2026

Conscious and Non-conscious Representations of Emotional Faces in Asperger's Syndrome

Conscious and Non-conscious Representations of Emotional Faces in Asperger's Syndrome

Published on: July 31, 2016

Author Spotlight: Addressing Technical and Subjective Challenges in Measuring Classroom Attention

Author Spotlight: Addressing Technical and Subjective Challenges in Measuring Classroom Attention

Published on: December 15, 2023

Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception

Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception

Published on: August 9, 2024

Area of Science:

Affective computing
Machine learning
Speech processing

Background:

Emotion recognition from audio is crucial for interactive systems in various fields.
A key challenge is the limited availability of labeled training data for high-performance models.
Self-supervised learning (SSL) offers a solution by learning from data properties without extensive labels.

Purpose of the Study:

To investigate the effectiveness of self-supervised learning pre-training for audio-based emotion recognition.
To apply SSL to encoded acoustic features from the CMU-MOSEI dataset.
To evaluate the impact of SSL on model performance compared to a baseline deep learning model.

Main Methods:

Applied self-supervised learning pre-training to encoded acoustic data (74 features) from the CMU-MOSEI dataset.
Pre-trained the model to predict masked acoustic data timestamps.
Fine-tuned the pre-trained model using a small set of annotated data.
Evaluated performance using Mean Absolute Error (MAE) and four-class accuracy, comparing against a baseline.

Main Results:

Self-supervised learning consistently improved model performance across all evaluated metrics (MAE, accuracy).
Performance gains were most significant when the amount of annotated data for fine-tuning was small.
SSL demonstrated notable improvements for easily classifiable emotions like happy, sad, and angry.
SSL improved performance even when applied to embedded feature representations, not just raw audio data.

Conclusions:

Self-supervised learning is highly beneficial for audio-based emotion recognition, particularly in low-data regimes.
SSL effectively enhances affective computing models by leveraging unlabeled data.
The study validates SSL's utility on encoded acoustic features, offering a practical approach for improving emotion recognition systems.