Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

A Survey on 3D Gaussian Splatting Applications: Segmentation, Editing, and Generation.

IEEE transactions on pattern analysis and machine intelligence·2026
Same author

XOV-Action: Towards Generalizable Open-Vocabulary Action Recognition.

IEEE transactions on pattern analysis and machine intelligence·2026
Same author

Human-Structure-Aware Token Position Embedding for Tokenized Pose Estimation.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026
Same author

Crafting Your Evolving Dreams: Concept-Incremental Versatile Customization.

IEEE transactions on pattern analysis and machine intelligence·2026
Same author

CuDi: Curve Distillation for Efficient and Controllable Exposure Adjustment.

IEEE transactions on pattern analysis and machine intelligence·2026
Same author

Event-Aware Instructed Assistant for Referring Video Segmentation.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026
Same journal

Relation DETR+: Exploring Explicit Position Relation Prior for Dense Prediction.

IEEE transactions on pattern analysis and machine intelligence·2026
Same journal

RBF++: Quantifying and Optimizing Reasoning Boundaries across Measurable and Unmeasurable Capabilities for Chain-of-Thought Reasoning.

IEEE transactions on pattern analysis and machine intelligence·2026
Same journal

CAFE: Cross-View Adaptive Fusion and Cluster Center Enhancement for Robust Multi-View Clustering.

IEEE transactions on pattern analysis and machine intelligence·2026
Same journal

DIVER: Reinforced Diffusion Breaks Imitation Bottlenecks in End-to-End Autonomous Driving.

IEEE transactions on pattern analysis and machine intelligence·2026
Same journal

Ethics-Aware Safe Reinforcement Learning for Rare-Event Risk Control in Interactive Urban Driving.

IEEE transactions on pattern analysis and machine intelligence·2026
Same journal

Learning Shape Anchors for Holistic Indoor Scene Understanding.

IEEE transactions on pattern analysis and machine intelligence·2026
See all related articles

Related Experiment Video

Updated: Jun 18, 2025

Swin-PSAxialNet: An Efficient Multi-Organ Segmentation Technique
04:48

Swin-PSAxialNet: An Efficient Multi-Organ Segmentation Technique

Published on: July 5, 2024

381

Transformer-Based Visual Segmentation: A Survey.

Xiangtai Li, Henghui Ding, Haobo Yuan

    IEEE Transactions on Pattern Analysis and Machine Intelligence
    |July 29, 2024
    PubMed
    Summary
    This summary is machine-generated.

    Transformers are revolutionizing visual segmentation, outperforming older methods in tasks like autonomous driving and medical analysis. This survey details their architecture, applications, and future research directions.

    More Related Videos

    A Swin Transformer-Based Model for Thyroid Nodule Detection in Ultrasound Images
    04:23

    A Swin Transformer-Based Model for Thyroid Nodule Detection in Ultrasound Images

    Published on: April 21, 2023

    1.8K
    Application of Deep Learning-Based Medical Image Segmentation via Orbital Computed Tomography
    04:48

    Application of Deep Learning-Based Medical Image Segmentation via Orbital Computed Tomography

    Published on: November 30, 2022

    2.7K

    Related Experiment Videos

    Last Updated: Jun 18, 2025

    Swin-PSAxialNet: An Efficient Multi-Organ Segmentation Technique
    04:48

    Swin-PSAxialNet: An Efficient Multi-Organ Segmentation Technique

    Published on: July 5, 2024

    381
    A Swin Transformer-Based Model for Thyroid Nodule Detection in Ultrasound Images
    04:23

    A Swin Transformer-Based Model for Thyroid Nodule Detection in Ultrasound Images

    Published on: April 21, 2023

    1.8K
    Application of Deep Learning-Based Medical Image Segmentation via Orbital Computed Tomography
    04:48

    Application of Deep Learning-Based Medical Image Segmentation via Orbital Computed Tomography

    Published on: November 30, 2022

    2.7K

    Area of Science:

    • Computer Vision
    • Artificial Intelligence
    • Machine Learning

    Background:

    • Visual segmentation partitions images/videos into meaningful groups for applications like autonomous driving and medical analysis.
    • Deep learning methods have advanced visual segmentation significantly over the last decade.
    • Transformers, originally for NLP, now excel in vision tasks, surpassing convolutional and recurrent networks.

    Purpose of the Study:

    • To provide a comprehensive overview of transformer-based visual segmentation methods.
    • To summarize recent advancements and unify recent transformer-based approaches under a meta-architecture.
    • To explore specific subfields and identify future research directions.

    Main Methods:

    • Review of background, including problem definitions, datasets, and prior convolutional methods.
    • Summary of a unifying meta-architecture for transformer-based visual segmentation.
    • Examination of various method designs, modifications, and applications based on the meta-architecture.

    Main Results:

    • Transformers offer robust, unified, and simpler solutions for diverse segmentation tasks.
    • Detailed examination of specific subfields: 3D point cloud segmentation, foundation model tuning, domain-aware, efficient, and medical segmentation.
    • Re-evaluation of reviewed methods on established datasets.

    Conclusions:

    • Transformer-based methods represent a significant advancement in visual segmentation.
    • The survey highlights key challenges and proposes future research avenues in this rapidly evolving field.