Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Perception of Sound Waves01:01

Perception of Sound Waves

4.7K
The human ear is not equally sensitive to all frequencies in the audible range. It may perceive sound waves with the same pressure but different frequencies as having different loudness. Moreover, the perception of sound waves depends on the health of an individual's ears, which decays with age. The health of one's ears may also be affected by regular exposure to loud noises.
The pitch of a sound depends on the frequency and the pressure amplitude of the source. Two sounds of the same...
4.7K
Perceiving Loudness, Pitch, and Location01:21

Perceiving Loudness, Pitch, and Location

1.3K
The human brain perceives pitch through two primary mechanisms reflected in place theory and frequency theory. Each mechanism describes how sound waves are interpreted as specific pitches by the brain, offering insights into the intricate processes of auditory perception.
Place theory, or place coding, suggests that different pitches are heard because various sound waves activate specific locations along the cochlea's basilar membrane. The brain determines the pitch of a sound by...
1.3K
Labeling Emotion01:20

Labeling Emotion

1.0K
Emotional labeling is a cognitive process that involves identifying and naming one's emotions, such as anger, fear, happiness, or sadness. It allows individuals to recognize and express their internal emotional states, a critical aspect of emotional regulation and communication. Labeling emotions requires more than mere recognition; it also involves drawing upon memory and contextual cues to understand the current situation and apply a corresponding emotional label. For instance, feeling...
1.0K
Non-Verbal Cues01:29

Non-Verbal Cues

784
Non-verbal communication extends beyond gestures and facial expressions to include vocal elements known as paralanguage. Paralanguage consists of non-verbal vocal cues such as pitch, loudness, speech rate, pauses, and non-verbal vocalizations like laughter, sighs, and moans. These elements not only accompany speech but also provide critical emotional and contextual information.The Role of Paralanguage in CommunicationParalanguage adds depth to spoken language by conveying emotions and...
784

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Multimodal LLM vs. Human-Measured Features for AI Predictions of Autism in Home Videos.

Algorithms·2026
Same author

Ensemble Modeling of Multiple Physical Indicators to Dynamically Phenotype Autism Spectrum Disorder.

Algorithms·2026
Same author

Remote Assessment of Parkinson Disease Using Deep Learning on Structured Mouse-Trace Data From Suspected Cases: Machine-Learning Pilot Feasibility Study.

JMIR formative research·2026
Same author

Correlates of Fitness Tracker Ownership and Use in Cancer Survivors: Cross-Sectional Survey.

JMIR cancer·2026
Same author

Aiding Large Language Models Using Clinical Scoresheets for Neurobehavioral Diagnostic Classification From Text: Algorithm Development and Validation.

JMIR AI·2025
Same author

mHealth technologies in research studying cardiovascular health in cancer: A systematic review.

PLOS digital health·2025
Same journal

Rodent Social Behavior Recognition Using a Global Context-Aware Vision Transformer Network.

AI (Basel, Switzerland)·2026
Same journal

Artificial Intelligence at the Intersection of Chemistry and Materials Science.

AI (Basel, Switzerland)·2026
Same journal

Monitoring Substance Use with Fitbit Biosignals: A Case Study on Training Deep Learning Models Using Ecological Momentary Assessments and Passive Sensing.

AI (Basel, Switzerland)·2025
Same journal

Can Artificial Intelligence Aid Diagnosis by Teleguided Point-of-Care Ultrasound? A Pilot Study for Evaluating a Novel Computer Algorithm for COVID-19 Diagnosis Using Lung Ultrasound.

AI (Basel, Switzerland)·2023
Same journal

Can Sequential Images from the Same Object Be Used for Training Machine Learning Models? A Case Study for Detecting Liver Disease by Ultrasound Radiomics.

AI (Basel, Switzerland)·2022
See all related articles

Related Experiment Video

Updated: May 6, 2026

Conscious and Non-conscious Representations of Emotional Faces in Asperger's Syndrome
08:31

Conscious and Non-conscious Representations of Emotional Faces in Asperger's Syndrome

Published on: July 31, 2016

13.1K

Audio-Based Emotion Recognition Using Self-Supervised Learning on an Engineered Feature Space.

Peranut Nimitsurachat1, Peter Washington2

  • 1Institute for Computational and Mathematical Engineering (ICME), Stanford University, Stanford, CA 94305, USA.

AI (Basel, Switzerland)
|May 8, 2024
PubMed
Summary
This summary is machine-generated.

Self-supervised learning (SSL) enhances audio-based emotion recognition models, especially when labeled data is scarce. This method improves performance by pre-training on acoustic features, proving most effective for easily classified emotions.

Keywords:
emotion classificationemotion recognitionself-supervised learningtransfer learning

More Related Videos

Author Spotlight: Addressing Technical and Subjective Challenges in Measuring Classroom Attention
06:37

Author Spotlight: Addressing Technical and Subjective Challenges in Measuring Classroom Attention

Published on: December 15, 2023

3.3K
Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception
05:48

Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception

Published on: August 9, 2024

1.5K

Related Experiment Videos

Last Updated: May 6, 2026

Conscious and Non-conscious Representations of Emotional Faces in Asperger's Syndrome
08:31

Conscious and Non-conscious Representations of Emotional Faces in Asperger's Syndrome

Published on: July 31, 2016

13.1K
Author Spotlight: Addressing Technical and Subjective Challenges in Measuring Classroom Attention
06:37

Author Spotlight: Addressing Technical and Subjective Challenges in Measuring Classroom Attention

Published on: December 15, 2023

3.3K
Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception
05:48

Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception

Published on: August 9, 2024

1.5K

Area of Science:

  • Affective computing
  • Machine learning
  • Speech processing

Background:

  • Emotion recognition from audio is crucial for interactive systems in various fields.
  • A key challenge is the limited availability of labeled training data for high-performance models.
  • Self-supervised learning (SSL) offers a solution by learning from data properties without extensive labels.

Purpose of the Study:

  • To investigate the effectiveness of self-supervised learning pre-training for audio-based emotion recognition.
  • To apply SSL to encoded acoustic features from the CMU-MOSEI dataset.
  • To evaluate the impact of SSL on model performance compared to a baseline deep learning model.

Main Methods:

  • Applied self-supervised learning pre-training to encoded acoustic data (74 features) from the CMU-MOSEI dataset.
  • Pre-trained the model to predict masked acoustic data timestamps.
  • Fine-tuned the pre-trained model using a small set of annotated data.
  • Evaluated performance using Mean Absolute Error (MAE) and four-class accuracy, comparing against a baseline.

Main Results:

  • Self-supervised learning consistently improved model performance across all evaluated metrics (MAE, accuracy).
  • Performance gains were most significant when the amount of annotated data for fine-tuning was small.
  • SSL demonstrated notable improvements for easily classifiable emotions like happy, sad, and angry.
  • SSL improved performance even when applied to embedded feature representations, not just raw audio data.

Conclusions:

  • Self-supervised learning is highly beneficial for audio-based emotion recognition, particularly in low-data regimes.
  • SSL effectively enhances affective computing models by leveraging unlabeled data.
  • The study validates SSL's utility on encoded acoustic features, offering a practical approach for improving emotion recognition systems.