Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Associative Learning

Associative Learning

Associative learning is a fundamental concept in behavioral psychology, wherein a connection is established between two stimuli or events, leading to a learned response. This process is critical in understanding how behaviors are acquired and modified. Conditioning, the mechanism through which associations are formed, can be divided into two main types: classical conditioning and operant conditioning, each elucidating different aspects of associative learning.
Classical conditioning, also known...

Higher Mental Functions of the Brain: Language

Higher Mental Functions of the Brain: Language

Language is a system of communication that allows the expression of thoughts, ideas, and feelings. The brain processes language in both hemispheres.
Language formation and comprehension take place in the dominant hemisphere. The dominant hemisphere is responsible for understanding the meaning of spoken, written, or sign language, as well as the ability to communicate. For most people, the left hemisphere is the dominant one. The right hemisphere, then, gives tone and emotional context to the...

Hierarchy of Motor Control

Hierarchy of Motor Control

The hierarchy of motor control refers to the different levels of organization and processing involved in controlling movement in the body. These levels range from higher cortical areas involved in planning and decision-making to lower spinal cord reflexes that respond automatically to external stimuli.

Language and Cognition

Language and Cognition

Language serves as a bridge between ideas and communication, influencing how individuals perceive and interact with the world. Psychologists have long debated whether language shapes thought or vice versa. This discussion gained grip with Edward Sapir and Benjamin Lee Whorf in the 1940s, who proposed that language determines thought, a concept known as linguistic determinism. They suggested that the vocabulary and structure of a language influence how its speakers think and perceive reality.

Multi-input and Multi-variable systems

Multi-input and Multi-variable systems

Cruise control systems in cars are designed as multi-input systems to maintain a driver's desired speed while compensating for external disturbances such as changes in terrain. The block diagram for a cruise control system typically includes two main inputs: the desired speed set by the driver and any external disturbances, such as the incline of the road. By adjusting the engine throttle, the system maintains the vehicle's speed as close to the desired value as possible.
In the absence...

Lateralization

Lateralization

Brain lateralization refers to the division of mental processes and functions between the two hemispheres of the brain, a phenomenon that optimizes neural efficiency and underpins complex abilities in humans. This specialization allows each hemisphere to perform tasks where it has a comparative advantage, facilitating more refined cognitive capabilities across different domains.

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Comparison of frequency-resolved optical polarization gating induced by molecular alignment and Kerr effects.

Optics letters·2012

Same author

Direct transformation of simple enals to 3,4-disubstituted benzaldehydes under mild reaction conditions via an organocatalytic regio- and chemoselective dimerization cascade.

Chemistry (Weinheim an der Bergstrasse, Germany)·2012

Same author

[Digital anatomy of the perforator flap in the thigh].

Zhonghua zheng xing wai ke za zhi = Zhonghua zhengxing waike zazhi = Chinese journal of plastic surgery·2012

Same author

[Value of methylation-specific mutiplex ligation-dependent probe in the diagnosis of Prader-Willi syndrome].

Zhongguo dang dai er ke za zhi = Chinese journal of contemporary pediatrics·2012

Same author

Elevated local TGF-β1 level predisposes a closed bone fracture to tuberculosis infection.

Medical hypotheses·2012

Same author

Modulation of P-glycoprotein expression by triptolide in adriamycin-resistant K562/A02 cells.

Oncology letters·2012

Same journal

Relation DETR+: Exploring Explicit Position Relation Prior for Dense Prediction.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

RBF++: Quantifying and Optimizing Reasoning Boundaries across Measurable and Unmeasurable Capabilities for Chain-of-Thought Reasoning.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

CAFE: Cross-View Adaptive Fusion and Cluster Center Enhancement for Robust Multi-View Clustering.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

DIVER: Reinforced Diffusion Breaks Imitation Bottlenecks in End-to-End Autonomous Driving.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

Ethics-Aware Safe Reinforcement Learning for Rare-Event Risk Control in Interactive Urban Driving.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

Learning Shape Anchors for Holistic Indoor Scene Understanding.

IEEE transactions on pattern analysis and machine intelligence·2026

See all related articles

Search research articles

Related Experiment Video

Updated: May 24, 2025

Using Eye Movements Recorded in the Visual World Paradigm to Explore the Online Processing of Spoken Language

Using Eye Movements Recorded in the Visual World Paradigm to Explore the Online Processing of Spoken Language

Published on: October 13, 2018

Hierarchical Banzhaf Interaction for General Video-Language Representation Learning.

Peng Jin, Hao Li, Li Yuan

IEEE Transactions on Pattern Analysis and Machine Intelligence

|March 3, 2025

Summary

This summary is machine-generated.

This study introduces Hierarchical Banzhaf Interaction (HBI) for fine-grained video-language representation learning. HBI models video-text interactions using game theory, improving multimodal understanding for AI tasks.

More Related Videos

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

Eye Tracking During Visually Situated Language Comprehension: Flexibility and Limitations in Uncovering Visual Context Effects

Eye Tracking During Visually Situated Language Comprehension: Flexibility and Limitations in Uncovering Visual Context Effects

Published on: November 30, 2018

Related Experiment Videos

Last Updated: May 24, 2025

Using Eye Movements Recorded in the Visual World Paradigm to Explore the Online Processing of Spoken Language

Using Eye Movements Recorded in the Visual World Paradigm to Explore the Online Processing of Spoken Language

Published on: October 13, 2018

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

Eye Tracking During Visually Situated Language Comprehension: Flexibility and Limitations in Uncovering Visual Context Effects

Eye Tracking During Visually Situated Language Comprehension: Flexibility and Limitations in Uncovering Visual Context Effects

Published on: November 30, 2018

Area of Science:

Artificial Intelligence
Computer Vision
Natural Language Processing

Background:

Multimodal representation learning, particularly contrastive learning, is crucial in AI.
Video-language representation learning currently relies on coarse-grained global semantic interactions.
There's a need for fine-grained multimodal learning to enhance representation quality.

Purpose of the Study:

To introduce a novel approach for fine-grained video-language representation learning.
To model video-text interactions using multivariate cooperative game theory to handle uncertainty.
To develop a method that captures diverse granularity, flexible combinations, and vague intensity in semantic interactions.

Main Methods:

Modeling video-text as game players using multivariate cooperative game theory.
Designing the Hierarchical Banzhaf Interaction (HBI) to simulate fine-grained correspondence between video clips and textual words from hierarchical perspectives.
Reconstructing representations by fusing single-modal and cross-modal components to mitigate bias and preserve adaptive encoding.
Extending the structure into a flexible encoder-decoder framework for adaptability to downstream tasks.

Main Results:

The Hierarchical Banzhaf Interaction effectively simulates fine-grained semantic correspondence.
Representation reconstruction mitigates bias, ensuring fine granularity and adaptive encoding.
The encoder-decoder framework demonstrates flexibility across various tasks.
Experiments on text-video retrieval, video-question answering, and video captioning benchmarks show superior performance.

Conclusions:

The proposed method significantly enhances fine-grained video-language representation learning.
The approach offers effective handling of uncertainty and diverse semantic interactions.
The method demonstrates strong effectiveness and generalization capabilities on multiple benchmarks.