Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Search research articles

Related Experiment Video

Updated: Jun 26, 2026

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

Published on: May 7, 2019

Perception Assisted Transformer for Unsupervised Object Re-Identification.

Shuoyi Chen, Mang Ye, Xingping Dong

IEEE Transactions on Image Processing : a Publication of the IEEE Signal Processing Society

|March 27, 2025

Summary

This summary is machine-generated.

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

LoRASculpt: Harmonious Low-Rank Adaptation for Multimodal Large Language Models.

IEEE transactions on pattern analysis and machine intelligence·2026

Same author

Towards clinical-level interpretation of dental panoramic radiography using an instance-guided vision-language model.

Nature biomedical engineering·2026

Same author

Systemic immune-inflammation index predicts post-thrombectomy outcomes and reveals a mediating role in the association between neurocardiac stress and prognosis: a multicenter study.

Frontiers in neurology·2026

Same author

HiSymGeo: Hierarchical Context Symbiosis for Cross-View Object-Level Image Geo-Localization.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

Same author

Holistic Invariant Retracing for Distortion-Resilient Multi-Modal Learning in Spatial Transcriptomics.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

Same author

Differentiable Clustering Graph Convolutional Network for Hyperspectral Unmixing: Methodology and Benchmark.

IEEE transactions on neural networks and learning systems·2026

Same journal

Change-Prior-Guided Unsupervised Change Detection of Heterogeneous Remote Sensing Images.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

Same journal

AgonicDreamer: Enhancing Multi-View Consistency in Text-to-3D Generation via Rectified Score Distillation.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

Same journal

BiCM-Prompt: Bidirectional Cross-Modal Prompt Tuning for Class-Incremental Learning on Multisource Remote Sensing Images.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

Same journal

GoP-based Quality Enhancement on Video Compression.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

Same journal

Align then Tensorize: Multi-Level Consistent Anchor Graph Learning for Scalable Multi-View Clustering.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

Same journal

Beyond Fidelity: Diverse Image Synthesis via Retrieval-Augmented Diffusion.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

See all related articles

This study introduces a Transformer-based framework for unsupervised object re-identification (Re-ID), enhancing feature learning with a novel mask alignment strategy. The proposed method achieves superior performance, outperforming many supervised approaches without identity annotations.

Area of Science:

Computer Vision
Machine Learning
Artificial Intelligence

Background:

Unsupervised object re-identification (Re-ID) traditionally uses Convolutional Neural Networks (CNNs) for feature extraction and pseudo-labeling.
CNNs have limitations in capturing long-range dependencies and integrating global information, hindering performance in complex scenarios.
Vision Transformers (ViTs) offer superior robustness and modeling capabilities for diverse data structures, showing promise for Re-ID tasks.

Purpose of the Study:

To explore the potential of Vision Transformers in unsupervised object re-identification (Re-ID).
To propose a novel Transformer-based framework (PAT) that enhances feature learning beyond category-level supervision.
To improve fine-grained feature alignment and instance-level discriminative learning in unsupervised Re-ID.

More Related Videos

A Swin Transformer-Based Model for Thyroid Nodule Detection in Ultrasound Images

A Swin Transformer-Based Model for Thyroid Nodule Detection in Ultrasound Images

Published on: April 21, 2023

End-To-End Deep Neural Network for Salient Object Detection in Complex Environments

End-To-End Deep Neural Network for Salient Object Detection in Complex Environments

Published on: December 15, 2023

Related Experiment Videos

Last Updated: Jun 26, 2026

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

Published on: May 7, 2019

A Swin Transformer-Based Model for Thyroid Nodule Detection in Ultrasound Images

A Swin Transformer-Based Model for Thyroid Nodule Detection in Ultrasound Images

Published on: April 21, 2023

End-To-End Deep Neural Network for Salient Object Detection in Complex Environments

End-To-End Deep Neural Network for Salient Object Detection in Complex Environments

Published on: December 15, 2023

Main Methods:

Proposed a Transformer-based perception-assisted framework (PAT) for unsupervised Re-ID.
Introduced a target-aware mask alignment (TMA) strategy to leverage low-level visual cues and guide fine-grained feature alignment using pseudo-labels.
Developed a perceptual fusion feature augmentation (PFA) method to optimize instance-level discriminative learning.

Main Results:

The PAT framework demonstrated superior performance and robustness on multiple Re-ID datasets compared to state-of-the-art methods.
The proposed TMA strategy effectively incorporates local pixel information for improved discriminative feature learning.
The method achieved results comparable to or better than many supervised Re-ID approaches, despite being unsupervised.

Conclusions:

Vision Transformers are highly effective for unsupervised object re-identification, particularly when combined with strategies that enhance fine-grained feature learning.
The proposed PAT framework, incorporating TMA and PFA, offers a powerful approach for unsupervised Re-ID by balancing discriminative learning and detailed understanding.
The method's ability to achieve strong performance without identity annotations highlights its potential for practical applications.