Fast-iTPN: Integrally Pre-Trained Transformer Pyramid Network With Token Migration
- Yunjie Tian , Lingxi Xie , Jihao Qiu , Jianbin Jiao , Yaowei Wang , Qi Tian , Qixiang Ye
Related Experiment Videos
Contact us if these videos are not relevant.
Contact us if these videos are not relevant.
View abstract on PubMed
Summary
This summary is machine-generated.We introduce Fast-iTPN, a novel vision transformer model that minimizes the gap between representation learning and downstream tasks. This efficient architecture accelerates inference by up to 70% with minimal performance loss.
Area Of Science
- Computer Vision
- Deep Learning
- Artificial Intelligence
Background
- Vision Transformer (ViT) models have shown great promise but often face challenges in bridging the gap between representation learning and downstream tasks.
- Existing methods may incur significant computational overhead and slow inference speeds.
Purpose Of The Study
- To propose an integrally pre-trained transformer pyramid network (iTPN) that jointly optimizes the network backbone and neck for minimal transfer gap.
- To introduce Fast-iTPN, an efficient variant that reduces computational memory and accelerates inference.
Main Methods
- iTPN utilizes the first pre-trained feature pyramid on ViT and multi-stage supervision with masked feature modeling (MFM).
- Fast-iTPN incorporates token migration and token gathering techniques to reduce computational costs and memory overhead.
- The model was evaluated on ImageNet-1K, COCO object detection, and ADE20K semantic segmentation benchmarks.
Main Results
- Fast-iTPN achieved high top-1 accuracy on ImageNet-1K (88.75%/89.5% for base/large).
- On COCO object detection and ADE20K semantic segmentation, Fast-iTPN demonstrated competitive performance (58.4%/58.8% box AP and 57.5%/58.7% mIoU, respectively).
- Inference speed was accelerated by up to 70% with negligible performance degradation.
Conclusions
- Fast-iTPN offers an efficient and effective backbone for various downstream computer vision tasks.
- The proposed methods significantly improve inference speed without compromising accuracy.
- This work presents a powerful and practical solution for real-world vision applications.
Related Experiment Videos
Contact us if these videos are not relevant.
Contact us if these videos are not relevant.

