Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Observational Learning

Observational Learning

Albert Bandura's observational learning, also known as imitation or modeling, occurs when a person observes and imitates another's behavior. It is a quicker process than operant conditioning. A well-known example is the Bobo doll study, where children who saw an adult acting aggressively towards the doll were more likely to act aggressively when left alone, compared to those who observed a nonaggressive adult. Many psychologists view observational learning as a form of latent learning...

Sign Test for Matched Pairs

Sign Test for Matched Pairs

The sign test for matched pairs offers a robust method for comparing two paired samples, often for the effects of an intervention in one of them. This method is very useful in situations where the underlying distribution of the data is unknown. The test compares two related samples—often pre- and post-treatment measurements on the same subjects—to determine if there are significant differences in their median values.
To conduct the sign test, we first calculate the differences in...

Visual Agnosia

Visual Agnosia

Visual agnosia is a condition characterized by the inability to recognize visually presented objects despite having normal vision. For instance, a person with visual agnosia can describe the shape and color of an object but cannot identify or name it. This impairment does not affect their visual field, acuity, color vision, brightness discrimination, language, or memory. An example of this condition in a social setting is someone at a dinner party asking for "that silver thing with a round...

Vision

Vision

Vision is the result of light being detected and transduced into neural signals by the retina of the eye. This information is then further analyzed and interpreted by the brain. First, light enters the front of the eye and is focused by the cornea and lens onto the retina—a thin sheet of neural tissue lining the back of the eye. Because of refraction through the convex lens of the eye, images are projected onto the retina upside-down and reversed.

Purposive Learning

Purposive Learning

E. C. Tolman emphasized the purposiveness of behavior — the idea that much of our behavior is goal-directed. For instance, employees who aim for a promotion work diligently to meet their targets. Tolman argued that when classical conditioning and operant conditioning occur, the organism acquires certain expectations. In classical conditioning, a child might fear a dog because they expect it to bite. In operant conditioning, a person might consistently work overtime because they expect a...

Associative Learning

Associative Learning

Associative learning is a fundamental concept in behavioral psychology, wherein a connection is established between two stimuli or events, leading to a learned response. This process is critical in understanding how behaviors are acquired and modified. Conditioning, the mechanism through which associations are formed, can be divided into two main types: classical conditioning and operant conditioning, each elucidating different aspects of associative learning.
Classical conditioning, also known...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Latent Chain-of-Thought for Visual Reasoning.

Advances in neural information processing systems·2026

Same author

Procedure-Aware Hierarchical Alignment for Open Surgery Video-Language Pretraining.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

Same author

Towards Automated Reporting: A Bronchoscopy Report Dataset for Enhancing Multimodality Large Language Models.

Scientific data·2026

Same author

Time-Series Machine Learning for Prediction of Bronchopulmonary Dysplasia.

The Journal of pediatrics·2026

Same author

Time series analysis of impact of COVID-19 on infant and neonatal mortality in the United States.

Pediatric research·2025

Same author

MagicTime: Time-Lapse Video Generation Models as Metamorphic Simulators.

IEEE transactions on pattern analysis and machine intelligence·2025

Same journal

Relation DETR+: Exploring Explicit Position Relation Prior for Dense Prediction.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

RBF++: Quantifying and Optimizing Reasoning Boundaries across Measurable and Unmeasurable Capabilities for Chain-of-Thought Reasoning.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

CAFE: Cross-View Adaptive Fusion and Cluster Center Enhancement for Robust Multi-View Clustering.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

DIVER: Reinforced Diffusion Breaks Imitation Bottlenecks in End-to-End Autonomous Driving.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

Ethics-Aware Safe Reinforcement Learning for Rare-Event Risk Control in Interactive Urban Driving.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

Learning Shape Anchors for Holistic Indoor Scene Understanding.

IEEE transactions on pattern analysis and machine intelligence·2026

See all related articles

Search research articles

Related Experiment Video

Updated: May 13, 2025

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

Published on: May 7, 2019

End-to-End Open-Vocabulary Video Visual Relationship Detection Using Multi-Modal Prompting.

Yongqi Wang, Xinxiao Wu, Shuo Yang

IEEE Transactions on Pattern Analysis and Machine Intelligence

|April 16, 2025

Summary

This summary is machine-generated.

This study introduces a unified framework for open-vocabulary video visual relationship detection, enhancing detection of unseen relationships between objects. The novel approach integrates trajectory detection and relationship classification, improving generalization to new object categories.

More Related Videos

Cross-Modal Multivariate Pattern Analysis

Cross-Modal Multivariate Pattern Analysis

Published on: November 9, 2011

Portable Intermodal Preferential Looking IPL: Investigating Language Comprehension in Typically Developing Toddlers and Young Children with Autism

Portable Intermodal Preferential Looking IPL: Investigating Language Comprehension in Typically Developing Toddlers and Young Children with Autism

Published on: December 14, 2012

Related Experiment Videos

Last Updated: May 13, 2025

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

Published on: May 7, 2019

Cross-Modal Multivariate Pattern Analysis

Cross-Modal Multivariate Pattern Analysis

Published on: November 9, 2011

Portable Intermodal Preferential Looking IPL: Investigating Language Comprehension in Typically Developing Toddlers and Young Children with Autism

Portable Intermodal Preferential Looking IPL: Investigating Language Comprehension in Typically Developing Toddlers and Young Children with Autism

Published on: December 14, 2012

Area of Science:

Computer Vision
Artificial Intelligence
Machine Learning

Background:

Current open-vocabulary video visual relationship detection methods rely on pre-trained trajectory detectors, limiting generalization to novel object categories.
This dependence leads to performance degradation when encountering unseen objects and relationships.

Purpose of the Study:

To develop an end-to-end open-vocabulary framework unifying object trajectory detection and relationship classification.
To improve the generalization ability of video visual relationship detection systems to novel object categories and relationships.

Main Methods:

Proposing a relationship-aware open-vocabulary trajectory detector using a query-based Transformer decoder with distilled CLIP visual encoder.
Integrating a relationship query and auxiliary loss to explicitly perceive object relationships during trajectory detection.
Developing an open-vocabulary relationship classifier with a multi-modal prompting method (spatio-temporal visual and vision-guided language prompting) to adapt CLIP.

Main Results:

The proposed framework demonstrates effectiveness on VidVRD and VidOR datasets.
The approach shows strong generalization ability in a challenging cross-dataset scenario.

Conclusions:

The unified framework successfully addresses limitations of existing methods in open-vocabulary video visual relationship detection.
The developed relationship-aware detector and classifier significantly enhance the detection of novel relationships and improve generalization.