Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Experiment Video

Updated: May 16, 2026

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment
08:25

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

Published on: May 7, 2019

Mask-Guided Self-Supervised Video Object Segmentation.

Ruijie Quan, Liulei Li, Zongxin Yang

    IEEE Transactions on Pattern Analysis and Machine Intelligence
    |May 14, 2026
    PubMed
    Summary
    This summary is machine-generated.

    Related Concept Videos

    You might also read

    Related Articles

    Articles linked to this work by shared authors, journal, and citation graph.

    Sort by
    Same author

    Spatio-Temporal Decoupled Knowledge Compensator for Few-Shot Action Recognition.

    IEEE transactions on pattern analysis and machine intelligence·2026
    Same author

    Large-Scale Omnidirectional Person Positioning.

    IEEE transactions on pattern analysis and machine intelligence·2025
    Same author

    Chemical knowledge-informed framework for privacy-aware retrosynthesis learning.

    Nature communications·2025
    Same author

    Data-And Knowledge-Driven Visual Abductive Reasoning.

    IEEE transactions on pattern analysis and machine intelligence·2025
    Same author

    CLIP4STR: A Simple Baseline for Scene Text Recognition with Pre-trained Vision-Language Model.

    IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2025
    Same author

    Exploiting Unlabeled Videos for Video-Text Retrieval via Pseudo-Supervised Learning.

    IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2025
    Same journal

    Relation DETR+: Exploring Explicit Position Relation Prior for Dense Prediction.

    IEEE transactions on pattern analysis and machine intelligence·2026
    Same journal

    RBF++: Quantifying and Optimizing Reasoning Boundaries across Measurable and Unmeasurable Capabilities for Chain-of-Thought Reasoning.

    IEEE transactions on pattern analysis and machine intelligence·2026
    Same journal

    CAFE: Cross-View Adaptive Fusion and Cluster Center Enhancement for Robust Multi-View Clustering.

    IEEE transactions on pattern analysis and machine intelligence·2026
    Same journal

    DIVER: Reinforced Diffusion Breaks Imitation Bottlenecks in End-to-End Autonomous Driving.

    IEEE transactions on pattern analysis and machine intelligence·2026
    Same journal

    Ethics-Aware Safe Reinforcement Learning for Rare-Event Risk Control in Interactive Urban Driving.

    IEEE transactions on pattern analysis and machine intelligence·2026
    Same journal

    Learning Shape Anchors for Holistic Indoor Scene Understanding.

    IEEE transactions on pattern analysis and machine intelligence·2026
    See all related articles

    This study introduces a self-supervised framework for video object segmentation (VOS) that learns mask propagation without manual labels. It achieves state-of-the-art results by generating pseudo-labels through pixel clustering and incorporating object-level context.

    Area of Science:

    • Computer Vision
    • Machine Learning
    • Artificial Intelligence

    Background:

    • Video Object Segmentation (VOS) typically requires extensive labeled data.
    • Existing self-supervised methods often rely on indirect label propagation, limiting performance.
    • There is a need for effective self-supervised VOS methods that can learn directly from unlabeled videos.

    Purpose of the Study:

    • To develop a unified self-supervised framework for video object segmentation (mask propagation).
    • To enable direct learning of mask-guided sequential segmentation from unlabeled video data.
    • To improve the precision and reliability of pseudo-segmentation labels for self-supervised learning.

    Main Methods:

    • A unified framework simultaneously modeling cross-frame dense correspondence and object-level context.

    More Related Videos

    Application of Deep Learning-Based Medical Image Segmentation via Orbital Computed Tomography
    04:48

    Application of Deep Learning-Based Medical Image Segmentation via Orbital Computed Tomography

    Published on: November 30, 2022

    Related Experiment Videos

    Last Updated: May 16, 2026

    Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment
    08:25

    Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

    Published on: May 7, 2019

    Application of Deep Learning-Based Medical Image Segmentation via Orbital Computed Tomography
    04:48

    Application of Deep Learning-Based Medical Image Segmentation via Orbital Computed Tomography

    Published on: November 30, 2022

  • Alternating between clustering video pixels for pseudo-label generation and using these labels for mask encoding/decoding.
  • Incorporating unsupervised correspondence learning to ensure representation generality and avoid cluster degeneracy.
  • Transitioning from offline to online clustering for streamlined integration and faster training.
  • Utilizing a semantic centroids pool for enhanced precision and reliability of pseudo-labels.
  • Main Results:

    • The proposed algorithm directly learns mask-guided sequential segmentation from unlabeled videos.
    • Achieved state-of-the-art performance on three standard benchmarks: DAVIS$_{17}$, YouTube-VOS, and VIP.
    • Significantly narrowed the performance gap between self-supervised and fully-supervised VOS methods.
    • Demonstrated effective generation of pseudo-labels without compromising training speed.

    Conclusions:

    • The developed self-supervised framework offers a powerful alternative to supervised VOS.
    • The novel approach of online pseudo-label generation and object-level context modeling is effective.
    • This work advances the field of self-supervised learning for video object segmentation.