Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Relative Motion Analysis using Rotating Axes

Relative Motion Analysis using Rotating Axes

Consider a component AB undergoing a linear motion. Along with a linear motion, point B also rotates around point A. To comprehend this complex movement, position vectors for both points A and B are established using a stationary reference frame.
However, to express the relative position of point B relative to point A, an additional frame of reference, denoted as x'y', is necessary. This additional frame not only translates but also rotates relative to the fixed frame, making it...

Relative Motion Analysis using Rotating Axes-Problem Solving

Relative Motion Analysis using Rotating Axes-Problem Solving

Consider a crane whose telescopic boom rotates with an angular velocity of 0.04 rad/s and angular acceleration of 0.02 rad/s2. Along with the rotation, the boom also extends linearly with a uniform speed of 5 m/s. The extension of the boom is measured at point D, which is measured with respect to the fixed point C on the other end of the boom. For the given instant, the distance between points C and D is 60 meters.
Here, in order to determine the magnitude of velocity and acceleration for point...

Depth Perception and Spatial Vision

Depth Perception and Spatial Vision

Depth perception is the ability to perceive objects three-dimensionally. It relies on two types of cues: binocular and monocular. Binocular cues depend on the combination of images from both eyes and how the eyes work together. Since the eyes are in slightly different positions, each eye captures a slightly different image. This disparity between images, known as binocular disparity, helps the brain interpret depth. When the brain compares these images, it determines the distance to an object.

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

LangSurf: Language-Embedded Surface Gaussians for 3D Scene Understanding.

IEEE transactions on pattern analysis and machine intelligence·2026

Same author

Breathing New Life into Small Object Detection with Detection-Oriented Rectification.

IEEE transactions on pattern analysis and machine intelligence·2026

Same author

PathTIGR: A pathway topology-informed graph representation learning framework for immunotherapy response prediction.

Science advances·2026

Same author

Interpretable graph deep learning framework for drug synergy prediction by integrating functional and clinical similarities.

NPJ digital medicine·2026

Same author

Pre-Fluorinated SEI by Catalyzing a Parasitic Reaction Toward Stable Silicon Anodes.

Small (Weinheim an der Bergstrasse, Germany)·2026

Same author

Stress-Mediated Lattice Reconstruction Regenerates Spent LiFePO<sub>4</sub> Cathodes.

Advanced materials (Deerfield Beach, Fla.)·2026

Same journal

Hidden Data Recovery and Forecasting via Next-Generation Reservoir Computing With Multiscale Delay Selection.

IEEE transactions on neural networks and learning systems·2026

Same journal

CAFF-CIL: Causality-Aware Freedom Forgetting Approach for Class-Incremental Learning.

IEEE transactions on neural networks and learning systems·2026

Same journal

Harmonic Autoencoding Framework for Multiple Tasks in Magnetic Particle Imaging Reconstruction.

IEEE transactions on neural networks and learning systems·2026

Same journal

A Survey on Human-Centric Voice-Face Multimodal Learning.

IEEE transactions on neural networks and learning systems·2026

Same journal

Vision-Assisted Foundation Model for Solving Multitask Vehicle Routing Problems.

IEEE transactions on neural networks and learning systems·2026

Same journal

FP3O: Enabling Proximal Policy Optimization in Multiagent Cooperation With Parameter-Sharing Versatility.

IEEE transactions on neural networks and learning systems·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Aug 3, 2025

Author Spotlight: Enhancement of Salient Object Detection for Smart Grid Applications

Author Spotlight: Enhancement of Salient Object Detection for Smart Grid Applications

Published on: December 15, 2023

Learning Complementary Spatial-Temporal Transformer for Video Salient Object Detection.

Nian Liu, Kepan Nan, Wangbo Zhao

IEEE Transactions on Neural Networks and Learning Systems

|April 7, 2023

Summary

This summary is machine-generated.

This study introduces CoSTFormer, a novel method for video salient object detection (VSOD) that effectively mines complementary spatial-temporal (ST) knowledge. It achieves state-of-the-art results by integrating appearance, motion, and enhanced ST context.

More Related Videos

A Swin Transformer-Based Model for Thyroid Nodule Detection in Ultrasound Images

A Swin Transformer-Based Model for Thyroid Nodule Detection in Ultrasound Images

Published on: April 21, 2023

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

Published on: May 7, 2019

Related Experiment Videos

Last Updated: Aug 3, 2025

Author Spotlight: Enhancement of Salient Object Detection for Smart Grid Applications

Author Spotlight: Enhancement of Salient Object Detection for Smart Grid Applications

Published on: December 15, 2023

A Swin Transformer-Based Model for Thyroid Nodule Detection in Ultrasound Images

A Swin Transformer-Based Model for Thyroid Nodule Detection in Ultrasound Images

Published on: April 21, 2023

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

Published on: May 7, 2019

Area of Science:

Computer Vision
Artificial Intelligence
Machine Learning

Background:

Video salient object detection (VSOD) requires integrating appearance, motion, and spatial-temporal (ST) information.
Existing methods often fail to fully exploit the complementary nature of short-term/long-term temporal cues and global-local spatial contexts.
There is a need for methods that can effectively model and leverage these complementary ST contexts.

Purpose of the Study:

To propose a novel complementary ST transformer (CoSTFormer) for VSOD.
To effectively mine and aggregate complementary spatial-temporal contexts, including long-short temporal cues and global-local spatial context.
To introduce a flow-guided window attention (FGWA) mechanism to address motion-related challenges in attention mechanisms.

Main Methods:

Developed CoSTFormer with short-global and long-local branches to capture complementary ST contexts.
Employed dense pairwise attention for global context and local attention windows for long-term temporal information fusion.
Introduced flow-guided window attention (FGWA) to align attention windows with object and camera movements.
Utilized fused appearance and motion features within the CoSTFormer framework.
Presented a pseudo video generation method for training ST saliency models using static images.

Main Results:

CoSTFormer effectively integrates appearance, motion, and complementary ST contexts.
The proposed FGWA mechanism successfully handles object and camera motion.
Achieved new state-of-the-art results on multiple benchmark datasets for VSOD.
Demonstrated the effectiveness of the pseudo video generation method for training.

Conclusions:

The proposed CoSTFormer significantly advances the field of video salient object detection.
The complementary ST context modeling and FGWA mechanism are crucial for high-performance VSOD.
The method offers a robust approach for integrating diverse visual cues for saliency prediction.