Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Vaginal microbiota and genitourinary syndrome of menopause in premenopausal breast cancer patients receiving endocrine therapy: a longitudinal cohort study protocol.

Frontiers in medicine·2026

Same author

2s-DAS: Two-Stream Diffusion with Multi-Modal Fusion for Temporal Action Segmentation.

Journal of imaging·2026

Same author

Intravenous administration of an engineered AAV9-gene-silencing vector suppresses human SOD1 and extends survival in an ALS mouse model.

Nature communications·2026

Same author

Lamellar Regulation for Fast and Reversible Zinc-Ion Transport in Water-Rich Hydrogels for Aqueous Zinc-Ion Batteries.

Small (Weinheim an der Bergstrasse, Germany)·2026

Same author

Electroacupuncture Ameliorates Learning and Memory Deficits in Vascular Cognitive Impairment Rats Through Activation of the Supramammillary Nucleus-Dentate Gyrus Circuit.

CNS neuroscience & therapeutics·2026

Same author

Identification and Validation of Hub Ferroptosis‑Related Genes in Sepsis: An Integrated Bioinformatics and Experimental Study.

Current molecular medicine·2026

Same journal

Intervention Feasible Region and Driver Risk Capacity Aware Human-Machine Collaborative Safe Trajectory Planning.

IEEE transactions on neural networks and learning systems·2026

Same journal

A Unified Differential Denoising Learning Framework With a Pre-Trained Model and Fuzzy Graph Networks for Drug-Drug Interaction Prediction.

IEEE transactions on neural networks and learning systems·2026

Same journal

Self-Supervised Continuous Dynamic Graph Representation Learning via Hawkes Processes.

IEEE transactions on neural networks and learning systems·2026

Same journal

cPU: Consistent Risk Estimator for Positive-Unlabeled Learning.

IEEE transactions on neural networks and learning systems·2026

Same journal

Tuning-Free Latent Diffusion Models for Ultrahigh-Resolution Image Editing.

IEEE transactions on neural networks and learning systems·2026

Same journal

Hidden Data Recovery and Forecasting via Next-Generation Reservoir Computing With Multiscale Delay Selection.

IEEE transactions on neural networks and learning systems·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Aug 20, 2025

Author Spotlight: Enhancement of Salient Object Detection for Smart Grid Applications

Author Spotlight: Enhancement of Salient Object Detection for Smart Grid Applications

Published on: December 15, 2023

TS-CAM: Token Semantic Coupled Attention Map for Weakly Supervised Object Localization.

Yuan Yao, Fang Wan, Wei Gao

IEEE Transactions on Neural Networks and Learning Systems

|November 23, 2022

Summary

This summary is machine-generated.

This study introduces the Vision Transformer for weakly supervised object localization (WSOL), overcoming CNN limitations in capturing full object extents. The proposed Token Semantic Coupled Attention Map (TS-CAM) method significantly improves object localization accuracy and multicategory performance.

More Related Videos

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

Published on: May 7, 2019

A Methodology for Capturing Joint Visual Attention Using Mobile Eye-Trackers

A Methodology for Capturing Joint Visual Attention Using Mobile Eye-Trackers

Published on: January 18, 2020

Related Experiment Videos

Last Updated: Aug 20, 2025

Author Spotlight: Enhancement of Salient Object Detection for Smart Grid Applications

Author Spotlight: Enhancement of Salient Object Detection for Smart Grid Applications

Published on: December 15, 2023

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

Published on: May 7, 2019

A Methodology for Capturing Joint Visual Attention Using Mobile Eye-Trackers

A Methodology for Capturing Joint Visual Attention Using Mobile Eye-Trackers

Published on: January 18, 2020

Area of Science:

Computer Vision
Machine Learning
Artificial Intelligence

Background:

Weakly supervised object localization (WSOL) using only image category labels is challenging.
Convolutional Neural Networks (CNNs) often fail to localize the full object extent, focusing instead on discriminative parts due to difficulties in capturing long-range semantic dependencies.

Purpose of the Study:

To address the limitations of CNNs in WSOL by leveraging the Vision Transformer (ViT).
To propose a novel method, Token Semantic Coupled Attention Map (TS-CAM), for improved object localization.

Main Methods:

Introduced the Vision Transformer to WSOL to capture long-range semantic dependencies via self-attention.
Developed TS-CAM, which decomposes class-aware semantics and couples them with attention maps for semantic-aware activation.
Employed spatial embedding by partitioning images into patch tokens and reallocating category semantics to these tokens for improved long-distance feature capture.

Main Results:

TS-CAM significantly outperformed CNN-based methods, achieving 11.6% and 28.9% improvements on ILSVRC and CUB-200-2011 datasets, respectively.
Demonstrated state-of-the-art performance in WSOL.
Showcased superior performance for multicategory object localization on the Pascal VOC dataset.

Conclusions:

The Vision Transformer, through the TS-CAM method, effectively captures long-range semantic dependencies for robust object localization.
TS-CAM overcomes the limitations of CNNs in WSOL, providing accurate localization of full object extents.
The proposed approach offers significant advancements in both single- and multicategory weakly supervised object localization.