Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Auditory Perception

Auditory Perception

The auditory system is essential for sound perception, utilizing various critical structures. When sound waves enter the outer ear, they travel through the ear canal and cause the eardrum to vibrate. These vibrations are then transmitted to the middle ear, where three tiny bones – the malleus, incus, and stapes – amplify the sound. This amplification is crucial, as it ensures that the sound vibrations are strong enough to be conveyed to the inner ear. These vibrations then reach the...

Sampling Methods: Overview

Sampling Methods: Overview

A sample refers to a smaller subset representative of a larger population. In analytical chemistry, studying or analyzing an entire population is often impractical or impossible. Therefore, samples are used to draw inferences and generalize the whole population. The sampling method selects individuals or items from a population to create a sample. Standard sampling methods include random, judgemental, systematic, stratified, and cluster sampling.
In analytical chemistry, the choice of...

Perceiving Loudness, Pitch, and Location

Perceiving Loudness, Pitch, and Location

The human brain perceives pitch through two primary mechanisms reflected in place theory and frequency theory. Each mechanism describes how sound waves are interpreted as specific pitches by the brain, offering insights into the intricate processes of auditory perception.
Place theory, or place coding, suggests that different pitches are heard because various sound waves activate specific locations along the cochlea's basilar membrane. The brain determines the pitch of a sound by...

Auditory Pathway

Auditory Pathway

Auditory pathways constitute the complex neural circuits responsible for transmitting and interpreting auditory information from the peripheral auditory system to the brain. Sound waves are initially captured by the outer ear, funneled through the ear canal, and reach the tympanic membrane (eardrum). These vibrations are transmitted via the middle ear's ossicles to the inner ear's cochlea.
When viewed cross-sectionally, the cochlea reveals the scala vestibuli and scala tympani flanking...

Classification of Signals

Classification of Signals

In signal processing, signals are classified based on various characteristics: continuous-time versus discrete-time, periodic versus aperiodic, analog versus digital, and causal versus noncausal. Each category highlights distinct properties crucial for understanding and manipulating signals.
A continuous-time signal holds a value at every instant in time, representing information seamlessly. In contrast, a discrete-time signal holds values only at specific moments, often denoted as x(n), where...

Introduction to Learning

Introduction to Learning

Learning is the process of acquiring knowledge or skills through practice or experience, leading to long-lasting behavioral changes. This acquisition occurs through interaction with the environment and requires practice or experience. For instance, mastering a skill such as surfing requires considerable practice and experience, highlighting the essential role of repeated interactions with the environment in learning.
In contrast to learned behaviors, unlearned behaviors such as crying, sexual...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Detection of Amyotrophic Lateral Sclerosis with Computer Audition: An Impact Analysis of Different Speech Tasks.

Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual International Conference·2025

Same author

Affective Dimensions in Maternal Voice During Child Feeding in Mothers With and Without Eating Disorder History-Findings From a Machine Learning Analysis of Speech Data.

European eating disorders review : the journal of the Eating Disorders Association·2025

Same author

Facial Emotion Recognition of 16 Distinct Emotions From Smartphone Videos: Comparative Study of Machine Learning and Human Performance.

Journal of medical Internet research·2025

Same author

Personalised Speech-Based PTSD Prediction Using Weighted-Instance Learning.

Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual International Conference·2025

Same author

Towards Predicting Menstrual Cycle Phases Exploiting Paralinguistic Features.

Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual International Conference·2025

Same author

Ecology & computer audition: Applications of audio technology to monitor organisms and environment.

Heliyon·2024

Same journal

Zero-shot reconstruction of mutant spatial transcriptomes.

Patterns (New York, N.Y.)·2026

Same journal

Dendritic nonlinearities mitigate communication costs.

Patterns (New York, N.Y.)·2026

Same journal

Erratum: Agentic AI as a coordination paradigm in digital health and agri-food systems.

Patterns (New York, N.Y.)·2026

Same journal

Spacing effect improves generalization in biological and artificial systems.

Patterns (New York, N.Y.)·2026

Same journal

A multi-modal foundation model for brain disease diagnosis and medical imaging.

Patterns (New York, N.Y.)·2026

Same journal

DuoMod-Net: Logarithmic balancing and geometric refinement for imbalanced semi-supervised medical image segmentation.

Patterns (New York, N.Y.)·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Aug 16, 2025

Asthma Detection Research Based on Voice Signal Processing and Machine Learning

Asthma Detection Research Based on Voice Signal Processing and Machine Learning

Published on: July 22, 2025

Audio self-supervised learning: A survey.

Shuo Liu¹, Adria Mallol-Ragolta¹, Emilia Parada-Cabaleiro²

¹Chair of Embedded Intelligence for Health Care & Wellbeing, University of Augsburg, 86159 Augsburg, Germany.

Patterns (New York, N.Y.)

|December 26, 2022

Summary

This summary is machine-generated.

Self-supervised learning (SSL) extracts general representations from data, reducing annotation needs. This review overviews audio SSL methods, multi-modal applications, and benchmarks for computer audition.

Keywords:

audio and speech processing multi-modal SSL representation learning self-supervised learning unsupervised learning

More Related Videos

A Lightweight, Headphones-based System for Manipulating Auditory Feedback in Songbirds

A Lightweight, Headphones-based System for Manipulating Auditory Feedback in Songbirds

Published on: November 26, 2012

Semi-Automated Analysis of Peak Amplitude and Latency for Auditory Brainstem Response Waveforms Using R

Semi-Automated Analysis of Peak Amplitude and Latency for Auditory Brainstem Response Waveforms Using R

Published on: December 9, 2022

Related Experiment Videos

Last Updated: Aug 16, 2025

Asthma Detection Research Based on Voice Signal Processing and Machine Learning

Asthma Detection Research Based on Voice Signal Processing and Machine Learning

Published on: July 22, 2025

A Lightweight, Headphones-based System for Manipulating Auditory Feedback in Songbirds

A Lightweight, Headphones-based System for Manipulating Auditory Feedback in Songbirds

Published on: November 26, 2012

Semi-Automated Analysis of Peak Amplitude and Latency for Auditory Brainstem Response Waveforms Using R

Semi-Automated Analysis of Peak Amplitude and Latency for Auditory Brainstem Response Waveforms Using R

Published on: December 9, 2022

Area of Science:

Artificial Intelligence
Machine Learning
Audio Signal Processing

Background:

Self-supervised learning (SSL) excels at discovering general data representations, minimizing reliance on human annotation.
SSL's success in computer vision and natural language processing has led to its expansion into audio and speech processing.
A comprehensive review of audio SSL methods is currently lacking.

Purpose of the Study:

To provide an overview of self-supervised learning methods applied to audio and speech processing.
To summarize research on using audio in multi-modal SSL frameworks.
To identify benchmarks for evaluating SSL in computer audition.

Main Methods:

Literature review of existing self-supervised learning techniques for audio.
Analysis of empirical studies incorporating audio modality in multi-modal SSL.
Compilation and discussion of relevant benchmarks for audio SSL evaluation.

Main Results:

Categorization and summary of various audio SSL approaches.
Overview of multi-modal SSL frameworks leveraging audio data.
Identification of key benchmarks for assessing SSL performance in computer audition.

Conclusions:

The field of audio SSL is rapidly growing, with diverse methods and applications.
Further research is needed to explore open problems and future directions in audio SSL.
This review serves as a foundational resource for researchers in audio and speech self-supervised learning.