Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Relative Motion Analysis using Rotating Axes-Problem Solving01:29

Relative Motion Analysis using Rotating Axes-Problem Solving

449
Consider a crane whose telescopic boom rotates with an angular velocity of 0.04 rad/s and angular acceleration of 0.02 rad/s2. Along with the rotation, the boom also extends linearly with a uniform speed of 5 m/s. The extension of the boom is measured at point D, which is measured with respect to the fixed point C on the other end of the boom. For the given instant, the distance between points C and D is 60 meters.
Here, in order to determine the magnitude of velocity and acceleration for point...
449
Depth Perception and Spatial Vision01:15

Depth Perception and Spatial Vision

909
Depth perception is the ability to perceive objects three-dimensionally. It relies on two types of cues: binocular and monocular. Binocular cues depend on the combination of images from both eyes and how the eyes work together. Since the eyes are in slightly different positions, each eye captures a slightly different image. This disparity between images, known as binocular disparity, helps the brain interpret depth. When the brain compares these images, it determines the distance to an object.
909
Relative Motion Analysis using Rotating Axes01:25

Relative Motion Analysis using Rotating Axes

533
Consider a component AB undergoing a linear motion. Along with a linear motion, point B also rotates around point A. To comprehend this complex movement, position vectors for both points A and B are established using a stationary reference frame.
However, to express the relative position of point B relative to point A, an additional frame of reference, denoted as x'y', is necessary. This additional frame not only translates but also rotates relative to the fixed frame, making it...
533

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Nitrate-Sialin2 axis couples ER-mitochondrial calcium signaling with fatty acid metabolism to drive white adipose browning.

Nature communications·2026
Same author

A study of the limitations of musical experience in Ancient Chinese Poetry - The case of the creation concert of Wei's Music Score.

PloS one·2026
Same author

Achieving high-performance manganese oxide-based aqueous zinc-ion batteries via a heterostructure strategy.

Journal of colloid and interface science·2026
Same author

Stimuli-Responsive Intelligent Coatings With Nano/Microcarriers for Early Corrosion Sensing: Advances and Challenges.

Small (Weinheim an der Bergstrasse, Germany)·2026
Same author

Restorative campus landscapes and student engagement: Dual affective pathways linking environmental perception to learning and social interaction.

Acta psychologica·2026
Same author

Mild photothermal-responsive hydrogel with H<sub>2</sub>Se delivery for anti-infection and microenvironment remodeling to bone regeneration in diabetic osteomyelitis.

Bioactive materials·2026
Same journal

RETRACTED: Zhang et al. A Novel Framework for Reconstruction and Imaging of Target Scattering Centers via Wide-Angle Incidence in Radar Networks. <i>Sensors</i> 2025, <i>25</i>, 6802.

Sensors (Basel, Switzerland)·2026
Same journal

Enhancing Unsupervised Multi-Source Domain Adaptation for Person Re-Identification via Mixture of Experts and Graph-Based Relation.

Sensors (Basel, Switzerland)·2026
Same journal

Development of an Instrumented Glove for Palmar Pressure Assessment in Kayakers.

Sensors (Basel, Switzerland)·2026
Same journal

Development and Experimental Validation of an Autonomous IoT-Based Monitoring System for Real-Time Water Quality Assessment in the Amazon River.

Sensors (Basel, Switzerland)·2026
Same journal

Semi-Supervised Adversarial Learning Framework for Controller Area Network Bus Intrusion Detection.

Sensors (Basel, Switzerland)·2026
Same journal

Smart Optimization Method for Safety Signs in Innovative Manufacturing Environments Integrating Industrial Field IoT Sensors and Knowledge Graphs.

Sensors (Basel, Switzerland)·2026
See all related articles

Related Experiment Video

Updated: Sep 11, 2025

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment
08:25

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

Published on: May 7, 2019

9.1K

Text-Guided Visual Representation Optimization for Sensor-Acquired Video Temporal Grounding.

Yun Tian1, Xiaobo Guo1, Jinsong Wang1

  • 1School of Optoelectronic Engineering, Changchun University of Science and Technology, Changchun 130022, China.

Sensors (Basel, Switzerland)
|August 14, 2025
PubMed
Summary
This summary is machine-generated.

This study introduces a novel framework to improve video temporal grounding (VTG) by optimizing visual representations using text guidance. The approach effectively bridges the cross-modal gap, enhancing semantic alignment for accurate video segment localization.

Keywords:
contrastive learningcross-attentioncross-modal learningrepresentation optimizationvideo temporal grounding

More Related Videos

Profiling Maternal Behavior Responses During Whole-Brain Imaging
07:12

Profiling Maternal Behavior Responses During Whole-Brain Imaging

Published on: January 24, 2025

1.1K
A Methodology for Capturing Joint Visual Attention Using Mobile Eye-Trackers
12:39

A Methodology for Capturing Joint Visual Attention Using Mobile Eye-Trackers

Published on: January 18, 2020

7.8K

Related Experiment Videos

Last Updated: Sep 11, 2025

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment
08:25

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

Published on: May 7, 2019

9.1K
Profiling Maternal Behavior Responses During Whole-Brain Imaging
07:12

Profiling Maternal Behavior Responses During Whole-Brain Imaging

Published on: January 24, 2025

1.1K
A Methodology for Capturing Joint Visual Attention Using Mobile Eye-Trackers
12:39

A Methodology for Capturing Joint Visual Attention Using Mobile Eye-Trackers

Published on: January 18, 2020

7.8K

Area of Science:

  • Computer Vision
  • Artificial Intelligence
  • Natural Language Processing

Background:

  • Video temporal grounding (VTG) aims to identify specific time segments in videos matching text queries.
  • Existing methods struggle with cross-modal semantic misalignment due to redundant visual data and independent text/video processing.
  • This misalignment hinders accurate localization of relevant video content based on natural language descriptions.

Purpose of the Study:

  • To propose a text-guided visual representation optimization framework to enhance semantic interpretation in video signals.
  • To narrow the cross-modal gap by leveraging textual information to focus on relevant spatiotemporal video content.
  • To improve the accuracy of video temporal grounding by refining visual representations.

Main Methods:

  • Utilized CLIP's unified cross-modal embedding space for representation structuring.
  • Introduced a Spatial Visual Representation Optimization (SVRO) module to refine intra-frame spatial information by selecting salient patches.
  • Developed a Temporal Visual Representation Optimization (TVRO) module with temporal triplet loss to refine inter-frame temporal relations and clip semantics.
  • Incorporated self-supervised contrastive loss for improved inter-clip discrimination.

Main Results:

  • The proposed framework demonstrated superior performance on widely used benchmark datasets: Charades-STA, ActivityNet Captions, and TACoS.
  • Outperformed existing state-of-the-art methods across multiple evaluation metrics.
  • Effectively enhanced semantic alignment between text queries and video content.

Conclusions:

  • The text-guided visual representation optimization framework significantly improves video temporal grounding.
  • The SVRO and TVRO modules effectively address spatial and temporal representation challenges, respectively.
  • The approach offers a promising direction for tackling cross-modal semantic misalignment in video understanding tasks.