Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Depth Perception and Spatial Vision

Depth Perception and Spatial Vision

Depth perception is the ability to perceive objects three-dimensionally. It relies on two types of cues: binocular and monocular. Binocular cues depend on the combination of images from both eyes and how the eyes work together. Since the eyes are in slightly different positions, each eye captures a slightly different image. This disparity between images, known as binocular disparity, helps the brain interpret depth. When the brain compares these images, it determines the distance to an object.

Chunking and Rehearsal in Sensory Memory

Chunking and Rehearsal in Sensory Memory

Improving short-term memory can be achieved through techniques like chunking and rehearsal. Chunking involves organizing information into larger, more manageable units. This technique is particularly useful for information that exceeds the typical memory span of between five and nine items. For instance, logging into an online account with a password like "ta89vq0179gz" involves grouping letters and numbers into three chunks—ta89, vq01, and 79gz. It makes large amounts of...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

A Survey on 3D Gaussian Splatting Applications: Segmentation, Editing, and Generation.

IEEE transactions on pattern analysis and machine intelligence·2026

Same author

XOV-Action: Towards Generalizable Open-Vocabulary Action Recognition.

IEEE transactions on pattern analysis and machine intelligence·2026

Same author

Human-Structure-Aware Token Position Embedding for Tokenized Pose Estimation.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

Same author

Crafting Your Evolving Dreams: Concept-Incremental Versatile Customization.

IEEE transactions on pattern analysis and machine intelligence·2026

Same author

Event-Aware Instructed Assistant for Referring Video Segmentation.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

Same author

Open-Set Anomaly Segmentation in Complex Scenarios.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

Same journal

Relation DETR+: Exploring Explicit Position Relation Prior for Dense Prediction.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

RBF++: Quantifying and Optimizing Reasoning Boundaries across Measurable and Unmeasurable Capabilities for Chain-of-Thought Reasoning.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

CAFE: Cross-View Adaptive Fusion and Cluster Center Enhancement for Robust Multi-View Clustering.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

DIVER: Reinforced Diffusion Breaks Imitation Bottlenecks in End-to-End Autonomous Driving.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

Ethics-Aware Safe Reinforcement Learning for Rare-Event Risk Control in Interactive Urban Driving.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

Learning Shape Anchors for Holistic Indoor Scene Understanding.

IEEE transactions on pattern analysis and machine intelligence·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Jun 28, 2025

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

Published on: May 7, 2019

Learning Local and Global Temporal Contexts for Video Semantic Segmentation.

Guolei Sun, Yun Liu, Henghui Ding

IEEE Transactions on Pattern Analysis and Machine Intelligence

|April 10, 2024

Summary

This summary is machine-generated.

This study introduces novel methods for video semantic segmentation (VSS) by integrating local and global temporal contexts. The proposed techniques enhance feature mining for more accurate video understanding.

More Related Videos

Eye Tracking During Visually Situated Language Comprehension: Flexibility and Limitations in Uncovering Visual Context Effects

Eye Tracking During Visually Situated Language Comprehension: Flexibility and Limitations in Uncovering Visual Context Effects

Published on: November 30, 2018

Application of Deep Learning-Based Medical Image Segmentation via Orbital Computed Tomography

Application of Deep Learning-Based Medical Image Segmentation via Orbital Computed Tomography

Published on: November 30, 2022

Related Experiment Videos

Last Updated: Jun 28, 2025

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

Published on: May 7, 2019

Eye Tracking During Visually Situated Language Comprehension: Flexibility and Limitations in Uncovering Visual Context Effects

Eye Tracking During Visually Situated Language Comprehension: Flexibility and Limitations in Uncovering Visual Context Effects

Published on: November 30, 2018

Application of Deep Learning-Based Medical Image Segmentation via Orbital Computed Tomography

Application of Deep Learning-Based Medical Image Segmentation via Orbital Computed Tomography

Published on: November 30, 2022

Area of Science:

Computer Vision
Artificial Intelligence
Machine Learning

Background:

Contextual information is crucial for accurate video semantic segmentation (VSS).
Existing methods often focus on either local temporal contexts (LTC) or global temporal contexts (GTC), but rarely both.
Simultaneously learning static and motional contexts within LTC offers complementary benefits.

Purpose of the Study:

To propose a unified approach for learning local temporal contexts (LTC) in VSS.
To introduce a method for incorporating global temporal contexts (GTC) to further improve VSS performance.
To enhance the accuracy and effectiveness of video semantic segmentation.

Main Methods:

Coarse-to-Fine Feature Mining (CFFM) technique to learn a unified representation of LTC.
CFFM comprises Coarse-to-Fine Feature Assembling (CFFA) for abstracting static and motional contexts, and Cross-frame Feature Mining (CFM) for enhancing features from neighboring frames.
CFFM++ extends CFFM by incorporating GTC through k-means clustering of sampled frames and CFM for prototype refinement.

Main Results:

The proposed CFFM method effectively learns a unified presentation of local temporal contexts.
CFFM++ demonstrates improved performance by additionally leveraging global temporal contexts.
Both CFFM and CFFM++ achieve state-of-the-art results on popular video semantic segmentation benchmarks.

Conclusions:

Simultaneously learning static and motional contexts via CFFM significantly enhances VSS.
Integrating global temporal contexts with CFFM++ further boosts segmentation accuracy.
The proposed methods offer a more comprehensive approach to exploiting temporal information for VSS.