Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Depth Perception and Spatial Vision01:15

Depth Perception and Spatial Vision

776
Depth perception is the ability to perceive objects three-dimensionally. It relies on two types of cues: binocular and monocular. Binocular cues depend on the combination of images from both eyes and how the eyes work together. Since the eyes are in slightly different positions, each eye captures a slightly different image. This disparity between images, known as binocular disparity, helps the brain interpret depth. When the brain compares these images, it determines the distance to an object.
776
Relative Motion Analysis using Rotating Axes01:25

Relative Motion Analysis using Rotating Axes

493
Consider a component AB undergoing a linear motion. Along with a linear motion, point B also rotates around point A. To comprehend this complex movement, position vectors for both points A and B are established using a stationary reference frame.
However, to express the relative position of point B relative to point A, an additional frame of reference, denoted as x'y', is necessary. This additional frame not only translates but also rotates relative to the fixed frame, making it...
493
Relative Motion Analysis using Rotating Axes-Problem Solving01:29

Relative Motion Analysis using Rotating Axes-Problem Solving

428
Consider a crane whose telescopic boom rotates with an angular velocity of 0.04 rad/s and angular acceleration of 0.02 rad/s2. Along with the rotation, the boom also extends linearly with a uniform speed of 5 m/s. The extension of the boom is measured at point D, which is measured with respect to the fixed point C on the other end of the boom. For the given instant, the distance between points C and D is 60 meters.
Here, in order to determine the magnitude of velocity and acceleration for point...
428

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Benefiting From OOD Samples in Open-Set Semi-Supervised Object Detection.

IEEE transactions on neural networks and learning systems·2026
Same author

DreamFuse: Towards Realistic and Seamless Image Fusion Across Diverse Scenarios.

IEEE transactions on pattern analysis and machine intelligence·2026
Same author

High-Fidelity and Lip-Synced Talking Face Synthesis via Landmark-Based Diffusion Model.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026
Same author

INTELCAPE: A Deep Learning-Powered System for Automated, High-Accuracy Crohn's Disease Diagnosis via Capsule Endoscopy.

Clinical gastroenterology and hepatology : the official clinical practice journal of the American Gastroenterological Association·2026
Same author

LearnMat: Semantic-Aware Self-Supervision Fine-Grained Visual Recognition.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026
Same author

Learning Prompt Adapters for Forgetting-Free Continual Image Super-Resolution.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026
Same journal

HardFlow: Hard-Constrained Sampling for Flow-Matching Models Via Trajectory Optimization.

IEEE transactions on pattern analysis and machine intelligence·2026
Same journal

Industrial Brain: Self-Evolving Neuro-Symbolic Autonomy with Causal Resilience for Cyber-Physical Systems.

IEEE transactions on pattern analysis and machine intelligence·2026
Same journal

Adaptive Hardness-Driven Dictionary Distillation for Incomplete Streaming View Clustering.

IEEE transactions on pattern analysis and machine intelligence·2026
Same journal

Mixture of Global and Local Experts with Diffusion Transformer for Controllable Face Generation.

IEEE transactions on pattern analysis and machine intelligence·2026
Same journal

Task-KV: Task-aware KV Cache Optimization via Semantic Differentiation of Attention Heads.

IEEE transactions on pattern analysis and machine intelligence·2026
Same journal

Achieving Text-based Person Retrieval with Any Granularity.

IEEE transactions on pattern analysis and machine intelligence·2026
See all related articles

Related Experiment Video

Updated: Aug 4, 2025

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment
08:25

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

Published on: May 7, 2019

9.0K

Language-Aware Spatial-Temporal Collaboration for Referring Video Segmentation.

Tianrui Hui, Si Liu, Zihan Ding

    IEEE Transactions on Pattern Analysis and Machine Intelligence
    |April 5, 2023
    PubMed
    Summary
    This summary is machine-generated.

    This study introduces a novel framework for referring video segmentation, improving object mask prediction by combining temporal and spatial feature encoders. The method enhances accuracy by adaptively modulating cross-modal interactions and propagating language-aware semantic information.

    More Related Videos

    A Comprehensive Protocol for Manual Segmentation of the Medial Temporal Lobe Structures
    12:30

    A Comprehensive Protocol for Manual Segmentation of the Medial Temporal Lobe Structures

    Published on: July 2, 2014

    20.4K
    Application of Deep Learning-Based Medical Image Segmentation via Orbital Computed Tomography
    04:48

    Application of Deep Learning-Based Medical Image Segmentation via Orbital Computed Tomography

    Published on: November 30, 2022

    2.8K

    Related Experiment Videos

    Last Updated: Aug 4, 2025

    Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment
    08:25

    Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

    Published on: May 7, 2019

    9.0K
    A Comprehensive Protocol for Manual Segmentation of the Medial Temporal Lobe Structures
    12:30

    A Comprehensive Protocol for Manual Segmentation of the Medial Temporal Lobe Structures

    Published on: July 2, 2014

    20.4K
    Application of Deep Learning-Based Medical Image Segmentation via Orbital Computed Tomography
    04:48

    Application of Deep Learning-Based Medical Image Segmentation via Orbital Computed Tomography

    Published on: November 30, 2022

    2.8K

    Area of Science:

    • Computer Vision
    • Artificial Intelligence
    • Machine Learning

    Background:

    • Referring video segmentation aims to identify objects in videos based on natural language descriptions.
    • Existing 3D Convolutional Neural Network (CNN) methods struggle with spatial misalignment, leading to inaccurate segmentation masks.
    • The challenge lies in effectively integrating temporal action recognition with precise spatial localization.

    Purpose of the Study:

    • To develop an advanced framework for referring video segmentation that overcomes limitations of previous methods.
    • To improve the accuracy of segmentation masks for referred objects in videos.
    • To enhance the integration of language understanding with visual feature extraction.

    Main Methods:

    • Proposed a language-aware spatial-temporal collaboration framework with separate 3D temporal and 2D spatial encoders.
    • Introduced Cross-Modal Adaptive Modulation (CMAM) and CMAM+ modules for adaptive multimodal feature extraction.
    • Developed a Language-Aware Semantic Propagation (LASP) module in the decoder for improved feature highlighting and suppression.

    Main Results:

    • The proposed framework demonstrated superior performance on four benchmark datasets for referring video segmentation.
    • Achieved state-of-the-art results, outperforming existing methods in segmentation accuracy.
    • Validated the effectiveness of the spatial-temporal collaboration and language-aware modules.

    Conclusions:

    • The developed framework significantly advances the state-of-the-art in referring video segmentation.
    • The proposed CMAM, CMAM+, and LASP modules effectively enhance multimodal feature interaction and semantic propagation.
    • This approach offers a more robust solution for accurately segmenting referred objects in videos using natural language.