Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

VizDefender: Unmasking Visualization Tampering Through Proactive Localization and Intent Inference.

IEEE transactions on visualization and computer graphics·2026
Same author

NewsVis: GenAI-Based Visual Storytelling for Corporate Financial News.

IEEE transactions on visualization and computer graphics·2026
Same author

VisGuard: Securing Visualization Dissemination through Tamper-Resistant Data Retrieval.

IEEE transactions on visualization and computer graphics·2025
Same author

Sel3DCraft: Interactive Visual Prompts for User-Friendly Text-to-3D Generation.

IEEE transactions on visualization and computer graphics·2025
Same author

RelMap: Reliable Spatiotemporal Sensor Data Visualization via Imputative Spatial Interpolation.

IEEE transactions on visualization and computer graphics·2025
Same author

Improved YOLOv8-Based Method for the Carapace Keypoint Detection and Size Measurement of Chinese Mitten Crabs.

Animals : an open access journal from MDPI·2025
Same journal

Relation DETR+: Exploring Explicit Position Relation Prior for Dense Prediction.

IEEE transactions on pattern analysis and machine intelligence·2026
Same journal

RBF++: Quantifying and Optimizing Reasoning Boundaries across Measurable and Unmeasurable Capabilities for Chain-of-Thought Reasoning.

IEEE transactions on pattern analysis and machine intelligence·2026
Same journal

CAFE: Cross-View Adaptive Fusion and Cluster Center Enhancement for Robust Multi-View Clustering.

IEEE transactions on pattern analysis and machine intelligence·2026
Same journal

DIVER: Reinforced Diffusion Breaks Imitation Bottlenecks in End-to-End Autonomous Driving.

IEEE transactions on pattern analysis and machine intelligence·2026
Same journal

Ethics-Aware Safe Reinforcement Learning for Rare-Event Risk Control in Interactive Urban Driving.

IEEE transactions on pattern analysis and machine intelligence·2026
Same journal

Learning Shape Anchors for Holistic Indoor Scene Understanding.

IEEE transactions on pattern analysis and machine intelligence·2026
See all related articles

Related Experiment Video

Updated: Mar 18, 2026

Utilizing vmTracking to Improve the Accuracy of Multi-Animal Pose Estimation in Rodent Social Behavior Studies
07:34

Utilizing vmTracking to Improve the Accuracy of Multi-Animal Pose Estimation in Rodent Social Behavior Studies

Published on: November 7, 2025

420

ChatTracker: Enhancing Visual Tracking via LLM-Driven Iterative Description Refinement.

Yu Zhang, Yiming Sun, Mi Zhang

    IEEE Transactions on Pattern Analysis and Machine Intelligence
    |March 16, 2026
    PubMed
    Summary
    This summary is machine-generated.

    ChatTracker improves vision-language tracking by using multimodal large language models (MLLMs) to generate accurate descriptions, overcoming issues with current datasets. This approach enhances tracking accuracy and benefits other visual tasks.

    More Related Videos

    Eye Tracking During Visually Situated Language Comprehension: Flexibility and Limitations in Uncovering Visual Context Effects
    07:36

    Eye Tracking During Visually Situated Language Comprehension: Flexibility and Limitations in Uncovering Visual Context Effects

    Published on: November 30, 2018

    16.5K
    A Methodology for Capturing Joint Visual Attention Using Mobile Eye-Trackers
    12:39

    A Methodology for Capturing Joint Visual Attention Using Mobile Eye-Trackers

    Published on: January 18, 2020

    8.3K

    Related Experiment Videos

    Last Updated: Mar 18, 2026

    Utilizing vmTracking to Improve the Accuracy of Multi-Animal Pose Estimation in Rodent Social Behavior Studies
    07:34

    Utilizing vmTracking to Improve the Accuracy of Multi-Animal Pose Estimation in Rodent Social Behavior Studies

    Published on: November 7, 2025

    420
    Eye Tracking During Visually Situated Language Comprehension: Flexibility and Limitations in Uncovering Visual Context Effects
    07:36

    Eye Tracking During Visually Situated Language Comprehension: Flexibility and Limitations in Uncovering Visual Context Effects

    Published on: November 30, 2018

    16.5K
    A Methodology for Capturing Joint Visual Attention Using Mobile Eye-Trackers
    12:39

    A Methodology for Capturing Joint Visual Attention Using Mobile Eye-Trackers

    Published on: January 18, 2020

    8.3K

    Area of Science:

    • Computer Vision
    • Artificial Intelligence
    • Natural Language Processing

    Background:

    • Vision-Language (VL) trackers leverage natural language for enhanced object tracking.
    • Current VL trackers underperform visual trackers due to inaccurate and ambiguous manual annotations.
    • Over 10% of annotations in existing VL tracking datasets are inaccurate.

    Purpose of the Study:

    • To address the annotation quality issue in VL tracking datasets.
    • To propose ChatTracker, a novel framework utilizing Multimodal Large Language Models (MLLMs) for high-quality description generation.
    • To enhance the accuracy and versatility of VL tracking.

    Main Methods:

    • Leveraging MLLMs for generating accurate language descriptions.
    • Introducing a Reflection-based Language Description Refinement Module for iterative refinement.
    • Developing a plug-and-play framework to integrate MLLM-generated descriptions into existing trackers.

    Main Results:

    • ChatTracker achieves performance comparable to State-of-the-Art (SoTA) trackers.
    • Generated descriptions improve VL tracker performance and text-to-image alignment.
    • The framework enhances performance in Referring Expression Comprehension (REC), Segmentation (RES), and Referring Video Object Segmentation (R-VOS).

    Conclusions:

    • ChatTracker effectively addresses annotation inaccuracies in VL tracking.
    • The proposed framework demonstrates universality across various visual tasks.
    • ChatTracker offers a promising direction for advancing VL tracking and related visual AI applications.