Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Observational Learning

Observational Learning

Albert Bandura's observational learning, also known as imitation or modeling, occurs when a person observes and imitates another's behavior. It is a quicker process than operant conditioning. A well-known example is the Bobo doll study, where children who saw an adult acting aggressively towards the doll were more likely to act aggressively when left alone, compared to those who observed a nonaggressive adult. Many psychologists view observational learning as a form of latent learning...

Associative Learning

Associative Learning

Associative learning is a fundamental concept in behavioral psychology, wherein a connection is established between two stimuli or events, leading to a learned response. This process is critical in understanding how behaviors are acquired and modified. Conditioning, the mechanism through which associations are formed, can be divided into two main types: classical conditioning and operant conditioning, each elucidating different aspects of associative learning.
Classical conditioning, also known...

Multi-input and Multi-variable systems

Multi-input and Multi-variable systems

Cruise control systems in cars are designed as multi-input systems to maintain a driver's desired speed while compensating for external disturbances such as changes in terrain. The block diagram for a cruise control system typically includes two main inputs: the desired speed set by the driver and any external disturbances, such as the incline of the road. By adjusting the engine throttle, the system maintains the vehicle's speed as close to the desired value as possible.
In the absence...

Long-Term Memory

Long-Term Memory

Long-term memory is a relatively permanent type of memory, capable of storing vast amounts of information over extended periods. Its storage capacity is generally considered unlimited.
Long-term memory can be categorized into two primary types: explicit and implicit memory. Explicit memory, also known as declarative memory, involves the conscious recollection of information that we deliberately try to remember, recall, and articulate. This type of memory encompasses specific facts, events, and...

Per-Unit Sequence Models

Per-Unit Sequence Models

An ideal Y-Y transformer, grounded through neutral impedances, displays per-unit sequence networks akin to those of a single-phase ideal transformer when subjected to balanced positive- or negative-sequence currents. These currents do not produce neutral currents, and their associated voltage drops.
Zero-sequence currents, which are identical in magnitude and phase, generate a neutral current, resulting in voltage drops across the neutral impedance and the low-voltage winding. If the...

Introduction to Learning

Introduction to Learning

Learning is the process of acquiring knowledge or skills through practice or experience, leading to long-lasting behavioral changes. This acquisition occurs through interaction with the environment and requires practice or experience. For instance, mastering a skill such as surfing requires considerable practice and experience, highlighting the essential role of repeated interactions with the environment in learning.
In contrast to learned behaviors, unlearned behaviors such as crying, sexual...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Segmental Outflow and Trabecular Meshwork Stiffness in an Ocular Hypertensive Mouse Model.

Investigative ophthalmology & visual science·2026

Same author

W-doped ZnO nanofibers for enhanced triethylamine sensing via electronic structure modulation.

Talanta·2026

Same author

Hydrogen Isotope Exchange in Pyridine Catalyzed by an Iron(II) Imido Complex: Counterion-Directed Regioselectivity.

Angewandte Chemie (International ed. in English)·2026

Same author

Preparation of biodegradable poly(lactic acid)-<i>b</i>-polyamide 4 block poly(ester amide) and its electrospun fibers.

RSC advances·2026

Same author

Computational mechanisms of spin-influenced organic reactions catalyzed by 3d iron-group metals.

Chemical Society reviews·2026

Same author

Urine-based detection of HPV for cervical cancer screening: towards clinical implementation.

Journal of clinical microbiology·2026

Same journal

Relation DETR+: Exploring Explicit Position Relation Prior for Dense Prediction.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

RBF++: Quantifying and Optimizing Reasoning Boundaries across Measurable and Unmeasurable Capabilities for Chain-of-Thought Reasoning.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

CAFE: Cross-View Adaptive Fusion and Cluster Center Enhancement for Robust Multi-View Clustering.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

DIVER: Reinforced Diffusion Breaks Imitation Bottlenecks in End-to-End Autonomous Driving.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

Ethics-Aware Safe Reinforcement Learning for Rare-Event Risk Control in Interactive Urban Driving.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

Learning Shape Anchors for Holistic Indoor Scene Understanding.

IEEE transactions on pattern analysis and machine intelligence·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Sep 13, 2025

A Methodology for Capturing Joint Visual Attention Using Mobile Eye-Trackers

A Methodology for Capturing Joint Visual Attention Using Mobile Eye-Trackers

Published on: January 18, 2020

Towards Universal Modal Tracking With Online Dense Temporal Token Learning.

Yaozong Zheng, Bineng Zhong, Qihua Liang

IEEE Transactions on Pattern Analysis and Machine Intelligence

|July 29, 2025

Summary

This summary is machine-generated.

We introduce UM-ODTrack, a universal video-level modality-aware tracking model. This model supports diverse tracking tasks with a single architecture, achieving state-of-the-art performance by leveraging temporal token learning.

More Related Videos

Measuring Statistical Learning Across Modalities and Domains in School-Aged Children Via an Online Platform and Neuroimaging Techniques

Measuring Statistical Learning Across Modalities and Domains in School-Aged Children Via an Online Platform and Neuroimaging Techniques

Published on: June 30, 2020

Author Spotlight: Advancing Large-Scale Neural Dynamics Through HD-MEA Technology

Author Spotlight: Advancing Large-Scale Neural Dynamics Through HD-MEA Technology

Published on: March 8, 2024

Related Experiment Videos

Last Updated: Sep 13, 2025

A Methodology for Capturing Joint Visual Attention Using Mobile Eye-Trackers

A Methodology for Capturing Joint Visual Attention Using Mobile Eye-Trackers

Published on: January 18, 2020

Measuring Statistical Learning Across Modalities and Domains in School-Aged Children Via an Online Platform and Neuroimaging Techniques

Measuring Statistical Learning Across Modalities and Domains in School-Aged Children Via an Online Platform and Neuroimaging Techniques

Published on: June 30, 2020

Author Spotlight: Advancing Large-Scale Neural Dynamics Through HD-MEA Technology

Author Spotlight: Advancing Large-Scale Neural Dynamics Through HD-MEA Technology

Published on: March 8, 2024

Area of Science:

Computer Vision
Artificial Intelligence
Machine Learning

Background:

Multi-modal tracking enhances robustness by integrating diverse sensor data (e.g., RGB, thermal, depth, event).
Existing multi-modal trackers often require task-specific architectures and extensive, independent training.
A unified approach for various tracking modalities remains a significant challenge.

Purpose of the Study:

To develop a universal video-level modality-aware tracking model (UM-ODTrack) adaptable to multiple tracking tasks.
To enable a single model architecture and parameter set to handle RGB, RGB+Thermal, RGB+Depth, and RGB+Event tracking.
To improve tracking performance and reduce training complexity through novel temporal token learning and cross-modal fusion.

Main Methods:

Video-level sampling to capture broader temporal context.
Online dense temporal token association for appearance and motion propagation.
Gated perceivers with attention mechanisms for adaptive cross-modal representation learning.
One-shot training for modality-scalable multi-task inference.

Main Results:

UM-ODTrack achieves state-of-the-art (SOTA) performance on visible and multi-modal benchmarks.
The model effectively leverages previous frame information as temporal prompts for future inference.
The one-shot training scheme significantly reduces training burden while enhancing model representation.

Conclusions:

UM-ODTrack offers a unified and efficient solution for diverse video tracking tasks.
The proposed method demonstrates superior performance and generalization capabilities across different modalities.
This work advances the field of multi-modal visual tracking with a scalable and effective approach.