Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Perceiving Loudness, Pitch, and Location

Perceiving Loudness, Pitch, and Location

The human brain perceives pitch through two primary mechanisms reflected in place theory and frequency theory. Each mechanism describes how sound waves are interpreted as specific pitches by the brain, offering insights into the intricate processes of auditory perception.
Place theory, or place coding, suggests that different pitches are heard because various sound waves activate specific locations along the cochlea's basilar membrane. The brain determines the pitch of a sound by...

Sound as Pressure Waves

Sound as Pressure Waves

Sound waves, which are longitudinal waves, can be modeled as the displacement amplitude varying as a function of the spatial and temporal coordinates. As a column of the medium is displaced, its successive columns are also displaced. As the successive displacements differ relatively, a pressure difference with the surrounding pressure is created. The gauge pressure varies across the medium.
The pressure fluctuation depends on the difference in displacements between the successive points in the...

Perception of Sound Waves

Perception of Sound Waves

The human ear is not equally sensitive to all frequencies in the audible range. It may perceive sound waves with the same pressure but different frequencies as having different loudness. Moreover, the perception of sound waves depends on the health of an individual's ears, which decays with age. The health of one's ears may also be affected by regular exposure to loud noises.
The pitch of a sound depends on the frequency and the pressure amplitude of the source. Two sounds of the same...

Auditory Perception

Auditory Perception

The auditory system is essential for sound perception, utilizing various critical structures. When sound waves enter the outer ear, they travel through the ear canal and cause the eardrum to vibrate. These vibrations are then transmitted to the middle ear, where three tiny bones – the malleus, incus, and stapes – amplify the sound. This amplification is crucial, as it ensures that the sound vibrations are strong enough to be conveyed to the inner ear. These vibrations then reach the...

Hearing

Hearing

When we hear a sound, our nervous system is detecting sound waves—pressure waves of mechanical energy traveling through a medium. The frequency of the wave is perceived as pitch, while the amplitude is perceived as loudness.

Sound Waves

Sound Waves

Sound waves can be thought of as fluctuations in the pressure of a medium through which they propagate. Since the pressure also makes the medium's particles vibrate along its direction of motion, the waves can be modeled as the displacement of the medium's particles from their mean position.
Sound waves are longitudinal in most fluids because fluids cannot sustain any lateral pressure. In solids, however, shear forces help in propagating the disturbance in the lateral direction as well....

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Deconstructing a behavioral state: parallel neural integrators control distinct features of an aversive behavioral state in <i>C. elegans</i>.

bioRxiv : the preprint server for biology·2026

Same author

Different grazing intensities affect soil nitrogen cycling by altering microbial nitrogen metabolism in alpine wetlands.

iScience·2026

Same author

HDAC8-mediated CAPZB desuccinylation enhances cytoskeleton remodeling to promote idiopathic pulmonary fibrosis.

Communications biology·2026

Same author

Development of canine parvovirus-neutralizing monoclonal antibodies from natural host and their germline gene usage.

Applied microbiology and biotechnology·2026

Same author

Deep neural networks to register and annotate cells in moving and deforming nervous systems.

eLife·2026

Same author

Theoretical insights into adsorption behaviors and <sup>17</sup>O nuclear magnetic resonance investigations of water clusters over xylitol-decorated hexagonal boron nitride.

Journal of molecular graphics & modelling·2026

Same journal

MesoSplats: Texture Synthesis with Gaussian Splatting.

IEEE transactions on visualization and computer graphics·2026

Same journal

GLLA: A Unified Force-Directed Graph Layout Framework Supporting Local Adjustments.

IEEE transactions on visualization and computer graphics·2026

Same journal

Multi-Perception Crowd: Learning to combine entity and implicit perception for diverse crowd simulation.

IEEE transactions on visualization and computer graphics·2026

Same journal

Hiding in Plain Sight: Camouflaging Real-world Objects.

IEEE transactions on visualization and computer graphics·2026

Same journal

RTF2Mesh: Restricted Tangent Face Based Mesh Compression With Neural Displacement Fields.

IEEE transactions on visualization and computer graphics·2026

Same journal

Practical Occluder Generation for Mobile Games.

IEEE transactions on visualization and computer graphics·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Jul 30, 2025

Author Spotlight: Deciphering the Cognitive and Neural Mechanisms of Gesture in Communication

Author Spotlight: Deciphering the Cognitive and Neural Mechanisms of Gesture in Communication

Published on: January 26, 2024

Audio2Gestures: Generating Diverse Gestures From Audio.

Jing Li, Di Kang, Wenjie Pei

IEEE Transactions on Visualization and Computer Graphics

|May 17, 2023

Summary

This summary is machine-generated.

Generating realistic co-speech gestures from audio is challenging due to diverse human motion. This study introduces a novel method to model one-to-many audio-to-motion relationships, producing more varied and natural movements.

More Related Videos

Capturing Dynamic Finger Gesturing with High-resolution Surface Electromyography and Computer Vision

Capturing Dynamic Finger Gesturing with High-resolution Surface Electromyography and Computer Vision

Published on: March 28, 2025

Real-Time Proxy-Control of Re-Parameterized Peripheral Signals using a Close-Loop Interface

Real-Time Proxy-Control of Re-Parameterized Peripheral Signals using a Close-Loop Interface

Published on: May 8, 2021

Related Experiment Videos

Last Updated: Jul 30, 2025

Author Spotlight: Deciphering the Cognitive and Neural Mechanisms of Gesture in Communication

Author Spotlight: Deciphering the Cognitive and Neural Mechanisms of Gesture in Communication

Published on: January 26, 2024

Capturing Dynamic Finger Gesturing with High-resolution Surface Electromyography and Computer Vision

Capturing Dynamic Finger Gesturing with High-resolution Surface Electromyography and Computer Vision

Published on: March 28, 2025

Real-Time Proxy-Control of Re-Parameterized Peripheral Signals using a Close-Loop Interface

Real-Time Proxy-Control of Re-Parameterized Peripheral Signals using a Close-Loop Interface

Published on: May 8, 2021

Area of Science:

Computer Vision
Artificial Intelligence
Human-Computer Interaction

Background:

Generating co-speech gestures from audio is complex due to the inherent one-to-many relationship between speech and motion.
Traditional models often predict average motions, leading to less diverse and engaging gestures.

Purpose of the Study:

To develop a novel approach for co-speech gesture generation that explicitly models the one-to-many audio-to-motion mapping.
To enhance the diversity and realism of generated gestures compared to existing methods.

Main Methods:

Proposed a Variational Autoencoder (VAE) framework that splits cross-modal latent codes into shared (audio-correlated) and motion-specific (diverse) components.
Introduced specialized training losses, including relaxed motion loss, bicycle constraint, and diversity loss, to address training complexities.
Validated the approach on 3D and 2D motion datasets, incorporating structured losses (e.g., STFT) for improved motion evaluation.

Main Results:

The proposed method significantly outperforms state-of-the-art approaches in generating more realistic and diverse co-speech gestures, both quantitatively and qualitatively.
Demonstrated compatibility with various backbones like RNNs, Transformers, and Discrete Cosine Transform (DCT) modeling.
Showcased the ability to generate motion sequences with user-specified motion clips.

Conclusions:

The novel latent code splitting strategy effectively captures diverse gestural information independent of audio.
The developed training strategies and evaluation metrics lead to superior motion dynamics and nuanced details in generated gestures.
The method offers a flexible and powerful solution for realistic and controllable co-speech gesture generation.