Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Relative Motion Analysis using Rotating Axes-Problem Solving

Relative Motion Analysis using Rotating Axes-Problem Solving

Consider a crane whose telescopic boom rotates with an angular velocity of 0.04 rad/s and angular acceleration of 0.02 rad/s2. Along with the rotation, the boom also extends linearly with a uniform speed of 5 m/s. The extension of the boom is measured at point D, which is measured with respect to the fixed point C on the other end of the boom. For the given instant, the distance between points C and D is 60 meters.
Here, in order to determine the magnitude of velocity and acceleration for point...

Depth Perception and Spatial Vision

Depth Perception and Spatial Vision

Depth perception is the ability to perceive objects three-dimensionally. It relies on two types of cues: binocular and monocular. Binocular cues depend on the combination of images from both eyes and how the eyes work together. Since the eyes are in slightly different positions, each eye captures a slightly different image. This disparity between images, known as binocular disparity, helps the brain interpret depth. When the brain compares these images, it determines the distance to an object.

Relative Motion Analysis using Rotating Axes

Relative Motion Analysis using Rotating Axes

Consider a component AB undergoing a linear motion. Along with a linear motion, point B also rotates around point A. To comprehend this complex movement, position vectors for both points A and B are established using a stationary reference frame.
However, to express the relative position of point B relative to point A, an additional frame of reference, denoted as x'y', is necessary. This additional frame not only translates but also rotates relative to the fixed frame, making it...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Nitrate-Sialin2 axis couples ER-mitochondrial calcium signaling with fatty acid metabolism to drive white adipose browning.

Nature communications·2026

Same author

A study of the limitations of musical experience in Ancient Chinese Poetry - The case of the creation concert of Wei's Music Score.

PloS one·2026

Same author

Achieving high-performance manganese oxide-based aqueous zinc-ion batteries via a heterostructure strategy.

Journal of colloid and interface science·2026

Same author

Stimuli-Responsive Intelligent Coatings With Nano/Microcarriers for Early Corrosion Sensing: Advances and Challenges.

Small (Weinheim an der Bergstrasse, Germany)·2026

Same author

Restorative campus landscapes and student engagement: Dual affective pathways linking environmental perception to learning and social interaction.

Acta psychologica·2026

Same author

Mild photothermal-responsive hydrogel with H<sub>2</sub>Se delivery for anti-infection and microenvironment remodeling to bone regeneration in diabetic osteomyelitis.

Bioactive materials·2026

Same journal

RETRACTED: Zhang et al. A Novel Framework for Reconstruction and Imaging of Target Scattering Centers via Wide-Angle Incidence in Radar Networks. <i>Sensors</i> 2025, <i>25</i>, 6802.

Sensors (Basel, Switzerland)·2026

Same journal

Enhancing Unsupervised Multi-Source Domain Adaptation for Person Re-Identification via Mixture of Experts and Graph-Based Relation.

Sensors (Basel, Switzerland)·2026

Same journal

Development of an Instrumented Glove for Palmar Pressure Assessment in Kayakers.

Sensors (Basel, Switzerland)·2026

Same journal

Development and Experimental Validation of an Autonomous IoT-Based Monitoring System for Real-Time Water Quality Assessment in the Amazon River.

Sensors (Basel, Switzerland)·2026

Same journal

Semi-Supervised Adversarial Learning Framework for Controller Area Network Bus Intrusion Detection.

Sensors (Basel, Switzerland)·2026

Same journal

Smart Optimization Method for Safety Signs in Innovative Manufacturing Environments Integrating Industrial Field IoT Sensors and Knowledge Graphs.

Sensors (Basel, Switzerland)·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Sep 11, 2025

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

Published on: May 7, 2019

Text-Guided Visual Representation Optimization for Sensor-Acquired Video Temporal Grounding.

Yun Tian¹, Xiaobo Guo¹, Jinsong Wang¹

¹School of Optoelectronic Engineering, Changchun University of Science and Technology, Changchun 130022, China.

Sensors (Basel, Switzerland)

|August 14, 2025

Summary

This summary is machine-generated.

This study introduces a novel framework to improve video temporal grounding (VTG) by optimizing visual representations using text guidance. The approach effectively bridges the cross-modal gap, enhancing semantic alignment for accurate video segment localization.

Keywords:

contrastive learning cross-attention cross-modal learning representation optimization video temporal grounding

More Related Videos

Profiling Maternal Behavior Responses During Whole-Brain Imaging

Profiling Maternal Behavior Responses During Whole-Brain Imaging

Published on: January 24, 2025

A Methodology for Capturing Joint Visual Attention Using Mobile Eye-Trackers

A Methodology for Capturing Joint Visual Attention Using Mobile Eye-Trackers

Published on: January 18, 2020

Related Experiment Videos

Last Updated: Sep 11, 2025

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

Published on: May 7, 2019

Profiling Maternal Behavior Responses During Whole-Brain Imaging

Profiling Maternal Behavior Responses During Whole-Brain Imaging

Published on: January 24, 2025

A Methodology for Capturing Joint Visual Attention Using Mobile Eye-Trackers

A Methodology for Capturing Joint Visual Attention Using Mobile Eye-Trackers

Published on: January 18, 2020

Area of Science:

Computer Vision
Artificial Intelligence
Natural Language Processing

Background:

Video temporal grounding (VTG) aims to identify specific time segments in videos matching text queries.
Existing methods struggle with cross-modal semantic misalignment due to redundant visual data and independent text/video processing.
This misalignment hinders accurate localization of relevant video content based on natural language descriptions.

Purpose of the Study:

To propose a text-guided visual representation optimization framework to enhance semantic interpretation in video signals.
To narrow the cross-modal gap by leveraging textual information to focus on relevant spatiotemporal video content.
To improve the accuracy of video temporal grounding by refining visual representations.

Main Methods:

Utilized CLIP's unified cross-modal embedding space for representation structuring.
Introduced a Spatial Visual Representation Optimization (SVRO) module to refine intra-frame spatial information by selecting salient patches.
Developed a Temporal Visual Representation Optimization (TVRO) module with temporal triplet loss to refine inter-frame temporal relations and clip semantics.
Incorporated self-supervised contrastive loss for improved inter-clip discrimination.

Main Results:

The proposed framework demonstrated superior performance on widely used benchmark datasets: Charades-STA, ActivityNet Captions, and TACoS.
Outperformed existing state-of-the-art methods across multiple evaluation metrics.
Effectively enhanced semantic alignment between text queries and video content.

Conclusions:

The text-guided visual representation optimization framework significantly improves video temporal grounding.
The SVRO and TVRO modules effectively address spatial and temporal representation challenges, respectively.
The approach offers a promising direction for tackling cross-modal semantic misalignment in video understanding tasks.