Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Search research articles

Related Experiment Video

Updated: Apr 4, 2026

Swin-PSAxialNet: An Efficient Multi-Organ Segmentation Technique

Swin-PSAxialNet: An Efficient Multi-Organ Segmentation Technique

Published on: July 5, 2024

WeakTr: Exploring Plain Vision Transformer for Weakly-Supervised Semantic Segmentation.

Lianghui Zhu, Yingyue Li, Jiemin Fang

IEEE Transactions on Image Processing : a Publication of the IEEE Signal Processing Society

|April 2, 2026

Summary

This summary is machine-generated.

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Intergenerational effects of BDE-47 and its photodegradation products BDE-28 on the rotifer Brachionus plicatilis: Impacts and mechanisms based on sexual reproduction.

Marine environmental research·2026

Same author

A transcriptional network underlying migratory cellular states and reduced 5-ALA-based photodynamic detectability in glioblastoma.

Molecular and clinical oncology·2026

Same author

Spatially Oriented S-Scheme and Schottky Junction in In<sub>2</sub>S<sub>3</sub>/Ti<sub>3</sub>C<sub>2</sub>/TiO<sub>2</sub> Ternary Heterojunction for Efficient Photocatalytic H<sub>2</sub> Production.

Molecules (Basel, Switzerland)·2026

Same author

An integrated method for railway fastener defect detection and geometric parameter measurement using 3D line laser sensor.

PloS one·2026

Same author

Nickel-catalyzed CNTs enhancing cycle performance of Si@C anodes.

RSC advances·2026

Same author

Leveraging intrinsic lignin component for toughening and hydro-stabilizing cellulose nanopaper.

Journal of colloid and interface science·2026

Same journal

Change-Prior-Guided Unsupervised Change Detection of Heterogeneous Remote Sensing Images.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

Same journal

AgonicDreamer: Enhancing Multi-View Consistency in Text-to-3D Generation via Rectified Score Distillation.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

Same journal

BiCM-Prompt: Bidirectional Cross-Modal Prompt Tuning for Class-Incremental Learning on Multisource Remote Sensing Images.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

Same journal

GoP-based Quality Enhancement on Video Compression.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

Same journal

Align then Tensorize: Multi-Level Consistent Anchor Graph Learning for Scalable Multi-View Clustering.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

Same journal

Beyond Fidelity: Diverse Image Synthesis via Retrieval-Augmented Diffusion.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

See all related articles

This study analyzes Vision Transformers (ViTs) using weakly-supervised semantic segmentation (WSSS) and class activation maps (CAM). A new method adaptively fuses attention maps for better object segmentation, achieving state-of-the-art WSSS performance.

Area of Science:

Computer Vision
Deep Learning
Artificial Intelligence

Background:

Transformers, particularly Vision Transformers (ViTs), excel in computer vision tasks.
Understanding ViT mechanisms is crucial for advancing the field.
Weakly-supervised semantic segmentation (WSSS) and Class Activation Maps (CAM) are key for analyzing ViTs.

Purpose of the Study:

To analyze the working mechanism of Vision Transformers (ViTs) using WSSS and CAM.
To propose a novel method for adaptively fusing self-attention maps from ViTs for improved WSSS and CAM generation.
To introduce an efficient and scalable ViT-based gradient clipping decoder for online retraining.

Main Methods:

Utilized a plain ViT pre-trained on ImageNet for analysis.

More Related Videos

A Swin Transformer-Based Model for Thyroid Nodule Detection in Ultrasound Images

A Swin Transformer-Based Model for Thyroid Nodule Detection in Ultrasound Images

Published on: April 21, 2023

Related Experiment Videos

Last Updated: Apr 4, 2026

Swin-PSAxialNet: An Efficient Multi-Organ Segmentation Technique

Swin-PSAxialNet: An Efficient Multi-Organ Segmentation Technique

Published on: July 5, 2024

A Swin Transformer-Based Model for Thyroid Nodule Detection in Ultrasound Images

A Swin Transformer-Based Model for Thyroid Nodule Detection in Ultrasound Images

Published on: April 21, 2023

Developed a method to estimate attention head importance and adaptively fuse self-attention maps.

Proposed a ViT-based gradient clipping decoder for efficient online retraining.

Main Results:

Multi-layer, multi-head self-attention maps provide rich information for WSSS and CAM.
The proposed adaptive fusion method generates higher-quality CAMs with more complete objects.
The WeakTr method achieved superior WSSS performance: 78.5% mIoU on PASCAL VOC 2012 and 51.1% mIoU on COCO 2014.

Conclusions:

Vision Transformers' self-attention maps contain valuable information for semantic segmentation and object localization.
The proposed adaptive fusion and gradient clipping decoder enhance ViT performance in WSSS.
The WeakTr method demonstrates significant improvements on standard WSSS benchmarks.