Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Relative Motion Analysis using Rotating Axes01:25

Relative Motion Analysis using Rotating Axes

493
Consider a component AB undergoing a linear motion. Along with a linear motion, point B also rotates around point A. To comprehend this complex movement, position vectors for both points A and B are established using a stationary reference frame.
However, to express the relative position of point B relative to point A, an additional frame of reference, denoted as x'y', is necessary. This additional frame not only translates but also rotates relative to the fixed frame, making it...
493
Relative Motion Analysis using Rotating Axes-Problem Solving01:29

Relative Motion Analysis using Rotating Axes-Problem Solving

428
Consider a crane whose telescopic boom rotates with an angular velocity of 0.04 rad/s and angular acceleration of 0.02 rad/s2. Along with the rotation, the boom also extends linearly with a uniform speed of 5 m/s. The extension of the boom is measured at point D, which is measured with respect to the fixed point C on the other end of the boom. For the given instant, the distance between points C and D is 60 meters.
Here, in order to determine the magnitude of velocity and acceleration for point...
428
Depth Perception and Spatial Vision01:15

Depth Perception and Spatial Vision

776
Depth perception is the ability to perceive objects three-dimensionally. It relies on two types of cues: binocular and monocular. Binocular cues depend on the combination of images from both eyes and how the eyes work together. Since the eyes are in slightly different positions, each eye captures a slightly different image. This disparity between images, known as binocular disparity, helps the brain interpret depth. When the brain compares these images, it determines the distance to an object.
776

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

LangSurf: Language-Embedded Surface Gaussians for 3D Scene Understanding.

IEEE transactions on pattern analysis and machine intelligence·2026
Same author

Breathing New Life into Small Object Detection with Detection-Oriented Rectification.

IEEE transactions on pattern analysis and machine intelligence·2026
Same author

PathTIGR: A pathway topology-informed graph representation learning framework for immunotherapy response prediction.

Science advances·2026
Same author

Interpretable graph deep learning framework for drug synergy prediction by integrating functional and clinical similarities.

NPJ digital medicine·2026
Same author

Pre-Fluorinated SEI by Catalyzing a Parasitic Reaction Toward Stable Silicon Anodes.

Small (Weinheim an der Bergstrasse, Germany)·2026
Same author

Stress-Mediated Lattice Reconstruction Regenerates Spent LiFePO<sub>4</sub> Cathodes.

Advanced materials (Deerfield Beach, Fla.)·2026
Same journal

Hidden Data Recovery and Forecasting via Next-Generation Reservoir Computing With Multiscale Delay Selection.

IEEE transactions on neural networks and learning systems·2026
Same journal

CAFF-CIL: Causality-Aware Freedom Forgetting Approach for Class-Incremental Learning.

IEEE transactions on neural networks and learning systems·2026
Same journal

Harmonic Autoencoding Framework for Multiple Tasks in Magnetic Particle Imaging Reconstruction.

IEEE transactions on neural networks and learning systems·2026
Same journal

A Survey on Human-Centric Voice-Face Multimodal Learning.

IEEE transactions on neural networks and learning systems·2026
Same journal

Vision-Assisted Foundation Model for Solving Multitask Vehicle Routing Problems.

IEEE transactions on neural networks and learning systems·2026
Same journal

FP3O: Enabling Proximal Policy Optimization in Multiagent Cooperation With Parameter-Sharing Versatility.

IEEE transactions on neural networks and learning systems·2026
See all related articles

Related Experiment Video

Updated: Aug 3, 2025

Author Spotlight: Enhancement of Salient Object Detection for Smart Grid Applications
03:31

Author Spotlight: Enhancement of Salient Object Detection for Smart Grid Applications

Published on: December 15, 2023

592

Learning Complementary Spatial-Temporal Transformer for Video Salient Object Detection.

Nian Liu, Kepan Nan, Wangbo Zhao

    IEEE Transactions on Neural Networks and Learning Systems
    |April 7, 2023
    PubMed
    Summary
    This summary is machine-generated.

    This study introduces CoSTFormer, a novel method for video salient object detection (VSOD) that effectively mines complementary spatial-temporal (ST) knowledge. It achieves state-of-the-art results by integrating appearance, motion, and enhanced ST context.

    More Related Videos

    A Swin Transformer-Based Model for Thyroid Nodule Detection in Ultrasound Images
    04:23

    A Swin Transformer-Based Model for Thyroid Nodule Detection in Ultrasound Images

    Published on: April 21, 2023

    1.9K
    Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment
    08:25

    Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

    Published on: May 7, 2019

    9.0K

    Related Experiment Videos

    Last Updated: Aug 3, 2025

    Author Spotlight: Enhancement of Salient Object Detection for Smart Grid Applications
    03:31

    Author Spotlight: Enhancement of Salient Object Detection for Smart Grid Applications

    Published on: December 15, 2023

    592
    A Swin Transformer-Based Model for Thyroid Nodule Detection in Ultrasound Images
    04:23

    A Swin Transformer-Based Model for Thyroid Nodule Detection in Ultrasound Images

    Published on: April 21, 2023

    1.9K
    Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment
    08:25

    Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

    Published on: May 7, 2019

    9.0K

    Area of Science:

    • Computer Vision
    • Artificial Intelligence
    • Machine Learning

    Background:

    • Video salient object detection (VSOD) requires integrating appearance, motion, and spatial-temporal (ST) information.
    • Existing methods often fail to fully exploit the complementary nature of short-term/long-term temporal cues and global-local spatial contexts.
    • There is a need for methods that can effectively model and leverage these complementary ST contexts.

    Purpose of the Study:

    • To propose a novel complementary ST transformer (CoSTFormer) for VSOD.
    • To effectively mine and aggregate complementary spatial-temporal contexts, including long-short temporal cues and global-local spatial context.
    • To introduce a flow-guided window attention (FGWA) mechanism to address motion-related challenges in attention mechanisms.

    Main Methods:

    • Developed CoSTFormer with short-global and long-local branches to capture complementary ST contexts.
    • Employed dense pairwise attention for global context and local attention windows for long-term temporal information fusion.
    • Introduced flow-guided window attention (FGWA) to align attention windows with object and camera movements.
    • Utilized fused appearance and motion features within the CoSTFormer framework.
    • Presented a pseudo video generation method for training ST saliency models using static images.

    Main Results:

    • CoSTFormer effectively integrates appearance, motion, and complementary ST contexts.
    • The proposed FGWA mechanism successfully handles object and camera motion.
    • Achieved new state-of-the-art results on multiple benchmark datasets for VSOD.
    • Demonstrated the effectiveness of the pseudo video generation method for training.

    Conclusions:

    • The proposed CoSTFormer significantly advances the field of video salient object detection.
    • The complementary ST context modeling and FGWA mechanism are crucial for high-performance VSOD.
    • The method offers a robust approach for integrating diverse visual cues for saliency prediction.