Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Auditory Perception01:17

Auditory Perception

412
The auditory system is essential for sound perception, utilizing various critical structures. When sound waves enter the outer ear, they travel through the ear canal and cause the eardrum to vibrate. These vibrations are then transmitted to the middle ear, where three tiny bones – the malleus, incus, and stapes – amplify the sound. This amplification is crucial, as it ensures that the sound vibrations are strong enough to be conveyed to the inner ear. These vibrations then reach the...
412
Sampling Methods: Overview01:06

Sampling Methods: Overview

406
A sample refers to a smaller subset representative of a larger population. In analytical chemistry, studying or analyzing an entire population is often impractical or impossible. Therefore, samples are used to draw inferences and generalize the whole population. The sampling method selects individuals or items from a population to create a sample. Standard sampling methods include random, judgemental, systematic, stratified, and cluster sampling. 
In analytical chemistry, the choice of...
406
Perceiving Loudness, Pitch, and Location01:21

Perceiving Loudness, Pitch, and Location

342
The human brain perceives pitch through two primary mechanisms reflected in place theory and frequency theory. Each mechanism describes how sound waves are interpreted as specific pitches by the brain, offering insights into the intricate processes of auditory perception.
Place theory, or place coding, suggests that different pitches are heard because various sound waves activate specific locations along the cochlea's basilar membrane. The brain determines the pitch of a sound by...
342
Auditory Pathway01:15

Auditory Pathway

5.6K
Auditory pathways constitute the complex neural circuits responsible for transmitting and interpreting auditory information from the peripheral auditory system to the brain. Sound waves are initially captured by the outer ear, funneled through the ear canal, and reach the tympanic membrane (eardrum). These vibrations are transmitted via the middle ear's ossicles to the inner ear's cochlea.
When viewed cross-sectionally, the cochlea reveals the scala vestibuli and scala tympani flanking...
5.6K
Classification of Signals01:30

Classification of Signals

638
In signal processing, signals are classified based on various characteristics: continuous-time versus discrete-time, periodic versus aperiodic, analog versus digital, and causal versus noncausal. Each category highlights distinct properties crucial for understanding and manipulating signals.
A continuous-time signal holds a value at every instant in time, representing information seamlessly. In contrast, a discrete-time signal holds values only at specific moments, often denoted as x(n), where...
638
Introduction to Learning01:18

Introduction to Learning

502
Learning is the process of acquiring knowledge or skills through practice or experience, leading to long-lasting behavioral changes. This acquisition occurs through interaction with the environment and requires practice or experience. For instance, mastering a skill such as surfing requires considerable practice and experience, highlighting the essential role of repeated interactions with the environment in learning.
In contrast to learned behaviors, unlearned behaviors such as crying, sexual...
502

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Detection of Amyotrophic Lateral Sclerosis with Computer Audition: An Impact Analysis of Different Speech Tasks.

Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual International Conference·2025
Same author

Affective Dimensions in Maternal Voice During Child Feeding in Mothers With and Without Eating Disorder History-Findings From a Machine Learning Analysis of Speech Data.

European eating disorders review : the journal of the Eating Disorders Association·2025
Same author

Facial Emotion Recognition of 16 Distinct Emotions From Smartphone Videos: Comparative Study of Machine Learning and Human Performance.

Journal of medical Internet research·2025
Same author

Personalised Speech-Based PTSD Prediction Using Weighted-Instance Learning.

Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual International Conference·2025
Same author

Towards Predicting Menstrual Cycle Phases Exploiting Paralinguistic Features.

Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual International Conference·2025
Same author

Ecology & computer audition: Applications of audio technology to monitor organisms and environment.

Heliyon·2024
Same journal

Zero-shot reconstruction of mutant spatial transcriptomes.

Patterns (New York, N.Y.)·2026
Same journal

Dendritic nonlinearities mitigate communication costs.

Patterns (New York, N.Y.)·2026
Same journal

Erratum: Agentic AI as a coordination paradigm in digital health and agri-food systems.

Patterns (New York, N.Y.)·2026
Same journal

Spacing effect improves generalization in biological and artificial systems.

Patterns (New York, N.Y.)·2026
Same journal

A multi-modal foundation model for brain disease diagnosis and medical imaging.

Patterns (New York, N.Y.)·2026
Same journal

DuoMod-Net: Logarithmic balancing and geometric refinement for imbalanced semi-supervised medical image segmentation.

Patterns (New York, N.Y.)·2026
See all related articles

Related Experiment Video

Updated: Aug 16, 2025

Asthma Detection Research Based on Voice Signal Processing and Machine Learning
04:04

Asthma Detection Research Based on Voice Signal Processing and Machine Learning

Published on: July 22, 2025

276

Audio self-supervised learning: A survey.

Shuo Liu1, Adria Mallol-Ragolta1, Emilia Parada-Cabaleiro2

  • 1Chair of Embedded Intelligence for Health Care & Wellbeing, University of Augsburg, 86159 Augsburg, Germany.

Patterns (New York, N.Y.)
|December 26, 2022
PubMed
Summary
This summary is machine-generated.

Self-supervised learning (SSL) extracts general representations from data, reducing annotation needs. This review overviews audio SSL methods, multi-modal applications, and benchmarks for computer audition.

Keywords:
audio and speech processingmulti-modal SSLrepresentation learningself-supervised learningunsupervised learning

More Related Videos

A Lightweight, Headphones-based System for Manipulating Auditory Feedback in Songbirds
10:13

A Lightweight, Headphones-based System for Manipulating Auditory Feedback in Songbirds

Published on: November 26, 2012

14.4K
Semi-Automated Analysis of Peak Amplitude and Latency for Auditory Brainstem Response Waveforms Using R
06:01

Semi-Automated Analysis of Peak Amplitude and Latency for Auditory Brainstem Response Waveforms Using R

Published on: December 9, 2022

2.6K

Related Experiment Videos

Last Updated: Aug 16, 2025

Asthma Detection Research Based on Voice Signal Processing and Machine Learning
04:04

Asthma Detection Research Based on Voice Signal Processing and Machine Learning

Published on: July 22, 2025

276
A Lightweight, Headphones-based System for Manipulating Auditory Feedback in Songbirds
10:13

A Lightweight, Headphones-based System for Manipulating Auditory Feedback in Songbirds

Published on: November 26, 2012

14.4K
Semi-Automated Analysis of Peak Amplitude and Latency for Auditory Brainstem Response Waveforms Using R
06:01

Semi-Automated Analysis of Peak Amplitude and Latency for Auditory Brainstem Response Waveforms Using R

Published on: December 9, 2022

2.6K

Area of Science:

  • Artificial Intelligence
  • Machine Learning
  • Audio Signal Processing

Background:

  • Self-supervised learning (SSL) excels at discovering general data representations, minimizing reliance on human annotation.
  • SSL's success in computer vision and natural language processing has led to its expansion into audio and speech processing.
  • A comprehensive review of audio SSL methods is currently lacking.

Purpose of the Study:

  • To provide an overview of self-supervised learning methods applied to audio and speech processing.
  • To summarize research on using audio in multi-modal SSL frameworks.
  • To identify benchmarks for evaluating SSL in computer audition.

Main Methods:

  • Literature review of existing self-supervised learning techniques for audio.
  • Analysis of empirical studies incorporating audio modality in multi-modal SSL.
  • Compilation and discussion of relevant benchmarks for audio SSL evaluation.

Main Results:

  • Categorization and summary of various audio SSL approaches.
  • Overview of multi-modal SSL frameworks leveraging audio data.
  • Identification of key benchmarks for assessing SSL performance in computer audition.

Conclusions:

  • The field of audio SSL is rapidly growing, with diverse methods and applications.
  • Further research is needed to explore open problems and future directions in audio SSL.
  • This review serves as a foundational resource for researchers in audio and speech self-supervised learning.