Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Light Acquisition02:16

Light Acquisition

8.0K
In order to produce glucose, plants need to capture sufficient light energy. Many modern plants have evolved leaves specialized for light acquisition. Leaves can be only millimeters in width or tens of meters wide, depending on the environment. Due to competition for sunlight, evolution has driven the evolution of increasingly larger leaves and taller plants, to avoid shading by their neighbors with contaminant elaboration of root architecture and mechanisms to transport water and nutrients.
8.0K
X-ray Imaging01:24

X-ray Imaging

7.7K
German physicist Wilhelm Röntgen (1845–1923) was experimenting with electrical current when he discovered that a mysterious and invisible "ray" would pass through his flesh but leave an outline of his bones on a screen coated with a metal compound. In 1895, Röntgen made the first durable record of the internal parts of a living human: an "X-ray" image (as it came to be called) of his wife’s hand. Scientists worldwide quickly began their own experiments with...
7.7K
Ultrasonography01:17

Ultrasonography

6.5K
Ultrasonography is an imaging technique that uses high-frequency sound waves to visualize the body's internal structures. It is a non-invasive and safe procedure that does not involve the use of ionizing radiation, making it widely used in various medical fields. Ultrasonography is used to study heart function, blood flow in the neck or extremities, certain conditions such as gallbladder disease, and fetal growth and development.
During an ultrasonography procedure, a handheld device called...
6.5K
Central-Force Motion01:17

Central-Force Motion

963
The central force system operates by exerting a force on an object directed towards a fixed point, typically the origin, with the force magnitude determined by the object's distance from this fixed point. In the context of an object with mass 'm,' polar coordinates are employed to express the equation of motion. Notably, the azimuthal component of force is nonexistent in this system. A comprehensive rewrite and integration of this equation reveal that the product of the squared...
963
Feedback control systems01:26

Feedback control systems

800
Feedback control systems are categorized in various ways based on their design, analysis, and signal types.
Linear feedback systems are theoretical models that simplify analysis and design. These systems operate under the principle that their output is directly proportional to their input within certain ranges. For instance, an amplifier in a control system behaves linearly as long as the input signal remains within a specific range. However, most physical systems exhibit inherent nonlinearity...
800
Perceiving Loudness, Pitch, and Location01:21

Perceiving Loudness, Pitch, and Location

1.3K
The human brain perceives pitch through two primary mechanisms reflected in place theory and frequency theory. Each mechanism describes how sound waves are interpreted as specific pitches by the brain, offering insights into the intricate processes of auditory perception.
Place theory, or place coding, suggests that different pitches are heard because various sound waves activate specific locations along the cochlea's basilar membrane. The brain determines the pitch of a sound by...
1.3K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Cd99l2 regulates excitatory synapse development and restrains immediate-early gene activation.

Cell reports·2026
Same author

VibGrasp: Spatiotemporal Vibration Based Multimodal Haptic Rendering with a Lightweight Exo-Glove for 3D Shape Perception.

IEEE transactions on haptics·2026
Same author

Efficacy and Safety of Pelubiprofen for Primary Dysmenorrhea: A Multicenter, Randomized, Double-Blind, Placebo-Controlled, Two-Period Crossover Trial.

Journal of clinical medicine·2026
Same author

Uncoupling memory impairments from autism-associated behaviors in Chd2 deficient mice.

Molecular psychiatry·2026
Same author

Bst2-targeted senotherapy restores visual function by eliminating senescent retinal cells.

Nature communications·2026
Same author

The Evolution of Lithography: From Resolution Scaling to Manufacturing Constraints.

Micromachines·2026
Same journal

Exploiting audio-visual modalities in videos: Object detection via multi-stage bilateral coupling network.

Neural networks : the official journal of the International Neural Network Society·2026
Same journal

Reliability-aware modality completion with cross-modal distillation for federated learning with missing modalities.

Neural networks : the official journal of the International Neural Network Society·2026
Same journal

IGFD-Net: Illumination-guided frequency decoupling for polarization image fusion.

Neural networks : the official journal of the International Neural Network Society·2026
Same journal

Multiple-Strategies dung beetle optimizer and its applications in engineering optimization and bankruptcy prediction.

Neural networks : the official journal of the International Neural Network Society·2026
Same journal

Aggregating global-scale pixel-wise forgery cues within a graph.

Neural networks : the official journal of the International Neural Network Society·2026
Same journal

Finite-Time intermittent control for secure synchronization of Neutral-Type stochastic delayed neural networks under aperiodic DoS attacks.

Neural networks : the official journal of the International Neural Network Society·2026
See all related articles

Related Experiment Video

Updated: May 3, 2026

Bringing the Visible Universe into Focus with Robo-AO
10:35

Bringing the Visible Universe into Focus with Robo-AO

Published on: February 12, 2013

19.5K

Robust sound-guided image manipulation.

Seung Hyun Lee1, Hyung-Gun Chi2, Gyeongrok Oh1

  • 1Department of Artificial Intelligence, Korea University, South Korea.

Neural Networks : the Official Journal of the International Neural Network Society
|April 18, 2024
PubMed
Summary
This summary is machine-generated.

This study introduces sound as a new input for image manipulation, enhancing semantic detail beyond text prompts. Sound-guided image manipulation yields more realistic and visually plausible results compared to text-only methods.

Keywords:
Image manipulationMulti-modal representation learningSelf-supervised learningSound

More Related Videos

Live-imaging of the Drosophila Pupal Eye
09:54

Live-imaging of the Drosophila Pupal Eye

Published on: January 12, 2015

9.5K
Estimation of Contact Regions Between Hands and Objects During Human Multi-Digit Grasping
09:41

Estimation of Contact Regions Between Hands and Objects During Human Multi-Digit Grasping

Published on: April 21, 2023

1.6K

Related Experiment Videos

Last Updated: May 3, 2026

Bringing the Visible Universe into Focus with Robo-AO
10:35

Bringing the Visible Universe into Focus with Robo-AO

Published on: February 12, 2013

19.5K
Live-imaging of the Drosophila Pupal Eye
09:54

Live-imaging of the Drosophila Pupal Eye

Published on: January 12, 2015

9.5K
Estimation of Contact Regions Between Hands and Objects During Human Multi-Digit Grasping
09:41

Estimation of Contact Regions Between Hands and Objects During Human Multi-Digit Grasping

Published on: April 21, 2023

1.6K

Area of Science:

  • Computer Vision
  • Artificial Intelligence
  • Machine Learning

Background:

  • Recent advancements enable image manipulation via text prompts using models like StyleCLIP.
  • Text prompts often lack the semantic richness to capture nuanced details, limiting manipulation quality.

Purpose of the Study:

  • To explore the potential of incorporating sound as an additional modality for image manipulation.
  • To develop a novel approach for sound-guided image manipulation that surpasses text-based methods in semantic detail and plausibility.

Main Methods:

  • Proposed a method to extend the joint embedding space of images and text with sound.
  • Employed a direct latent optimization technique for image manipulation driven by audio input.
  • Trained and evaluated a unified image-text-sound embedding space.

Main Results:

  • Sound-guided image manipulation demonstrated superior semantic and visual plausibility compared to state-of-the-art text and sound-guided methods.
  • Human evaluations confirmed the enhanced quality of sound-guided manipulations.
  • Downstream task evaluations validated the effectiveness of the learned joint embedding space in encoding sound information.

Conclusions:

  • Sound offers a richer source of semantic cues for image manipulation than text alone.
  • The proposed sound-guided approach significantly improves the realism and detail of manipulated images.
  • The integrated image-text-sound embedding space provides a robust foundation for multimodal AI applications.