Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Search research articles

Related Experiment Video

Updated: May 5, 2026

Using Eye-tracking to Assess the Relative Importance of Visual and Vestibular Input to Subcortical Motion Processing in the Roll Plane

Using Eye-tracking to Assess the Relative Importance of Visual and Vestibular Input to Subcortical Motion Processing in the Roll Plane

Published on: August 22, 2025

A Vision-Based Subtitle Generator: Text Reconstruction via Subtle Vibrations from Videos.

Yan Wang¹, Yingchong Wang¹, Xiuqi Zhang¹

¹School of Mechanical Engineering, Beijing Institute of Technology, Haidian District, Beijing 100081, China.

Sensors (Basel, Switzerland)

|March 14, 2026

Summary

This summary is machine-generated.

Related Concept Videos

Vision

Vision

Vision is the result of light being detected and transduced into neural signals by the retina of the eye. This information is then further analyzed and interpreted by the brain. First, light enters the front of the eye and is focused by the cornea and lens onto the retina—a thin sheet of neural tissue lining the back of the eye. Because of refraction through the convex lens of the eye, images are projected onto the retina upside-down and reversed.

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Identification and analysis of the AP2/ERF gene family in <i>Dendrobium officinale</i> based on pan-genome and functional characterization of <i>DofERF109_2</i>.

Frontiers in plant science·2026

Same author

Insufficient or excessive exercise activities are associated with suboptimal treatment outcomes in patients with psoriasis: a longitudinal study in shanghai, China.

Annals of medicine·2026

Same author

Efficacy and safety of MuShengshu in the treatment of mild-to-moderate atopic dermatitis: protocol for a randomized, double-blind, placebo-controlled trial.

Frontiers in medicine·2026

Same author

Nitric oxide dual-enhanced nanosystem boosts ferroptosis-chemotherapy synergy for tumor therapy.

Scientific reports·2026

Same author

Ginsenoside Ro ameliorates d-galactose-induced sarcopenia by modulating oxidative stress, inflammation, and gut microbiota in mice.

Phytomedicine : international journal of phytotherapy and phytopharmacology·2026

Same author

Metabolic-immune axis in pregnancy: Implications for women with autoimmune diseases.

Journal of reproductive immunology·2026

Same journal

Enhancing Unsupervised Multi-Source Domain Adaptation for Person Re-Identification via Mixture of Experts and Graph-Based Relation.

Sensors (Basel, Switzerland)·2026

Same journal

Development of an Instrumented Glove for Palmar Pressure Assessment in Kayakers.

Sensors (Basel, Switzerland)·2026

Same journal

Development and Experimental Validation of an Autonomous IoT-Based Monitoring System for Real-Time Water Quality Assessment in the Amazon River.

Sensors (Basel, Switzerland)·2026

Same journal

Semi-Supervised Adversarial Learning Framework for Controller Area Network Bus Intrusion Detection.

Sensors (Basel, Switzerland)·2026

Same journal

Smart Optimization Method for Safety Signs in Innovative Manufacturing Environments Integrating Industrial Field IoT Sensors and Knowledge Graphs.

Sensors (Basel, Switzerland)·2026

Same journal

Three-Dimensional Modeling and Performance Analysis of Dynamic mmWave V2I Networks Based on Stochastic Geometry.

Sensors (Basel, Switzerland)·2026

See all related articles

This study introduces a Vision-based Subtitle Generator (VSG) that converts sound-induced object vibrations into text. This novel approach uses phase-based motion estimation and a Transformer architecture for accurate speech recovery from visual data.

Area of Science:

Computer Vision
Acoustics
Signal Processing

Background:

Ambient sound, particularly speech, induces subtle vibrations in everyday objects.
These vibrations contain acoustic cues that can be potentially decoded into text.
Applications exist in monitoring and security.

Purpose of the Study:

To present the Vision-based Subtitle Generator (VSG).
To enable direct text recovery from high-speed videos of sound-induced object vibrations using a generative approach.
To reduce the dependency on large volumes of video data for training.

Main Methods:

Introduced a phase-based motion estimation (PME) technique, treating pixels as "independent microphones" to extract pseudo-acoustic signals.
Utilized a pretrained Hidden-unit Bidirectional Encoder Representations from Transformers (HuBERT) as the encoder for the VSG-Transformer architecture.

Keywords:

phase-based motion estimation (PME)pretrained acoustic model text reconstruction from vibrations transformer

Related Experiment Videos

Last Updated: May 5, 2026

Using Eye-tracking to Assess the Relative Importance of Visual and Vestibular Input to Subcortical Motion Processing in the Roll Plane

Using Eye-tracking to Assess the Relative Importance of Visual and Vestibular Input to Subcortical Motion Processing in the Roll Plane

Published on: August 22, 2025

Leveraged generative approach for vibration-to-text conversion.

Main Results:

Achieved character error rates of 13.7% (Base) and 12.5% (Large) for text generation from chip bag vibrations.
Demonstrated the effectiveness of the generative approach in vibration-to-text transcription.
Showcased robustness to lower sampling rates, maintaining performance with limited temporal sampling.

Conclusions:

The VSG-Transformer effectively recovers text from sound-induced object vibrations.
The proposed methods significantly reduce the need for extensive video datasets.
The system shows promise for real-world applications in diverse acoustic environments.