Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Depth Perception and Spatial Vision

Depth Perception and Spatial Vision

Depth perception is the ability to perceive objects three-dimensionally. It relies on two types of cues: binocular and monocular. Binocular cues depend on the combination of images from both eyes and how the eyes work together. Since the eyes are in slightly different positions, each eye captures a slightly different image. This disparity between images, known as binocular disparity, helps the brain interpret depth. When the brain compares these images, it determines the distance to an object.

Relative Motion Analysis using Rotating Axes

Relative Motion Analysis using Rotating Axes

Consider a component AB undergoing a linear motion. Along with a linear motion, point B also rotates around point A. To comprehend this complex movement, position vectors for both points A and B are established using a stationary reference frame.
However, to express the relative position of point B relative to point A, an additional frame of reference, denoted as x'y', is necessary. This additional frame not only translates but also rotates relative to the fixed frame, making it...

Relative Motion Analysis using Rotating Axes-Problem Solving

Relative Motion Analysis using Rotating Axes-Problem Solving

Consider a crane whose telescopic boom rotates with an angular velocity of 0.04 rad/s and angular acceleration of 0.02 rad/s2. Along with the rotation, the boom also extends linearly with a uniform speed of 5 m/s. The extension of the boom is measured at point D, which is measured with respect to the fixed point C on the other end of the boom. For the given instant, the distance between points C and D is 60 meters.
Here, in order to determine the magnitude of velocity and acceleration for point...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Benefiting From OOD Samples in Open-Set Semi-Supervised Object Detection.

IEEE transactions on neural networks and learning systems·2026

Same author

DreamFuse: Towards Realistic and Seamless Image Fusion Across Diverse Scenarios.

IEEE transactions on pattern analysis and machine intelligence·2026

Same author

High-Fidelity and Lip-Synced Talking Face Synthesis via Landmark-Based Diffusion Model.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

Same author

INTELCAPE: A Deep Learning-Powered System for Automated, High-Accuracy Crohn's Disease Diagnosis via Capsule Endoscopy.

Clinical gastroenterology and hepatology : the official clinical practice journal of the American Gastroenterological Association·2026

Same author

LearnMat: Semantic-Aware Self-Supervision Fine-Grained Visual Recognition.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

Same author

Learning Prompt Adapters for Forgetting-Free Continual Image Super-Resolution.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

Same journal

HardFlow: Hard-Constrained Sampling for Flow-Matching Models Via Trajectory Optimization.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

Industrial Brain: Self-Evolving Neuro-Symbolic Autonomy with Causal Resilience for Cyber-Physical Systems.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

Adaptive Hardness-Driven Dictionary Distillation for Incomplete Streaming View Clustering.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

Mixture of Global and Local Experts with Diffusion Transformer for Controllable Face Generation.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

Task-KV: Task-aware KV Cache Optimization via Semantic Differentiation of Attention Heads.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

Achieving Text-based Person Retrieval with Any Granularity.

IEEE transactions on pattern analysis and machine intelligence·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Aug 4, 2025

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

Published on: May 7, 2019

Language-Aware Spatial-Temporal Collaboration for Referring Video Segmentation.

Tianrui Hui, Si Liu, Zihan Ding

IEEE Transactions on Pattern Analysis and Machine Intelligence

|April 5, 2023

Summary

This summary is machine-generated.

This study introduces a novel framework for referring video segmentation, improving object mask prediction by combining temporal and spatial feature encoders. The method enhances accuracy by adaptively modulating cross-modal interactions and propagating language-aware semantic information.

More Related Videos

A Comprehensive Protocol for Manual Segmentation of the Medial Temporal Lobe Structures

A Comprehensive Protocol for Manual Segmentation of the Medial Temporal Lobe Structures

Published on: July 2, 2014

Application of Deep Learning-Based Medical Image Segmentation via Orbital Computed Tomography

Application of Deep Learning-Based Medical Image Segmentation via Orbital Computed Tomography

Published on: November 30, 2022

Related Experiment Videos

Last Updated: Aug 4, 2025

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

Published on: May 7, 2019

A Comprehensive Protocol for Manual Segmentation of the Medial Temporal Lobe Structures

A Comprehensive Protocol for Manual Segmentation of the Medial Temporal Lobe Structures

Published on: July 2, 2014

Application of Deep Learning-Based Medical Image Segmentation via Orbital Computed Tomography

Application of Deep Learning-Based Medical Image Segmentation via Orbital Computed Tomography

Published on: November 30, 2022

Area of Science:

Computer Vision
Artificial Intelligence
Machine Learning

Background:

Referring video segmentation aims to identify objects in videos based on natural language descriptions.
Existing 3D Convolutional Neural Network (CNN) methods struggle with spatial misalignment, leading to inaccurate segmentation masks.
The challenge lies in effectively integrating temporal action recognition with precise spatial localization.

Purpose of the Study:

To develop an advanced framework for referring video segmentation that overcomes limitations of previous methods.
To improve the accuracy of segmentation masks for referred objects in videos.
To enhance the integration of language understanding with visual feature extraction.

Main Methods:

Proposed a language-aware spatial-temporal collaboration framework with separate 3D temporal and 2D spatial encoders.
Introduced Cross-Modal Adaptive Modulation (CMAM) and CMAM+ modules for adaptive multimodal feature extraction.
Developed a Language-Aware Semantic Propagation (LASP) module in the decoder for improved feature highlighting and suppression.

Main Results:

The proposed framework demonstrated superior performance on four benchmark datasets for referring video segmentation.
Achieved state-of-the-art results, outperforming existing methods in segmentation accuracy.
Validated the effectiveness of the spatial-temporal collaboration and language-aware modules.

Conclusions:

The developed framework significantly advances the state-of-the-art in referring video segmentation.
The proposed CMAM, CMAM+, and LASP modules effectively enhance multimodal feature interaction and semantic propagation.
This approach offers a more robust solution for accurately segmenting referred objects in videos using natural language.