Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Observational Learning

Observational Learning

Albert Bandura's observational learning, also known as imitation or modeling, occurs when a person observes and imitates another's behavior. It is a quicker process than operant conditioning. A well-known example is the Bobo doll study, where children who saw an adult acting aggressively towards the doll were more likely to act aggressively when left alone, compared to those who observed a nonaggressive adult. Many psychologists view observational learning as a form of latent learning...

Introduction to Learning

Introduction to Learning

Learning is the process of acquiring knowledge or skills through practice or experience, leading to long-lasting behavioral changes. This acquisition occurs through interaction with the environment and requires practice or experience. For instance, mastering a skill such as surfing requires considerable practice and experience, highlighting the essential role of repeated interactions with the environment in learning.
In contrast to learned behaviors, unlearned behaviors such as crying, sexual...

Associative Learning

Associative Learning

Associative learning is a fundamental concept in behavioral psychology, wherein a connection is established between two stimuli or events, leading to a learned response. This process is critical in understanding how behaviors are acquired and modified. Conditioning, the mechanism through which associations are formed, can be divided into two main types: classical conditioning and operant conditioning, each elucidating different aspects of associative learning.
Classical conditioning, also known...

Multi-input and Multi-variable systems

Multi-input and Multi-variable systems

Cruise control systems in cars are designed as multi-input systems to maintain a driver's desired speed while compensating for external disturbances such as changes in terrain. The block diagram for a cruise control system typically includes two main inputs: the desired speed set by the driver and any external disturbances, such as the incline of the road. By adjusting the engine throttle, the system maintains the vehicle's speed as close to the desired value as possible.
In the absence...

Cognitive Learning

Cognitive Learning

Cognitive learning is based on purposive behavior, incidental learning, and insight learning.
E. C. Tolman's theory of purposive behavior emphasizes that much behavior is goal-directed. He argued that to understand behavior, we must look at the entire sequence of actions leading to a goal. For instance, high school students study hard, not just due to past reinforcement but also to achieve the goal of getting into a good college.
Tolman introduced the idea that behavior is influenced by...

Hypothesis: Accept or Fail to Reject?

Hypothesis: Accept or Fail to Reject?

The outcome of any hypothesis testing leads to rejecting or not rejecting the null hypothesis. This decision is taken based on the analysis of the data, an appropriate test statistic, an appropriate confidence level, the critical values, and P-values. However, when the evidence suggests that the null hypothesis cannot be rejected, is it right to say, 'Accept' the null hypothesis?
There are two ways to indicate that the null hypothesis is not rejected. 'Accept' the null...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

The endoplasmic reticulum is a target organelle for trivalent dimethylarsinic acid (DMAIII)-induced cytotoxicity.

Toxicology and applied pharmacology·2012

Same author

(E)-1-{4-[Bis(4-bromo-phen-yl)meth-yl]piperazin-1-yl}-3-(4-eth-oxy-phen-yl)prop-2-en-1-one.

Acta crystallographica. Section E, Structure reports online·2012

Same author

(E)-1-{4-[Bis(4-bromo-phen-yl)meth-yl]piperazin-1-yl}-3-(4-methyl-phen-yl)prop-2-en-1-one.

Acta crystallographica. Section E, Structure reports online·2012

Same author

(E)-3-(1,3-Benzodioxol-5-yl)-1-{4-[bis-(4-meth-oxy-phen-yl)meth-yl]piperazin-1-yl}prop-2-en-1-one.

Acta crystallographica. Section E, Structure reports online·2012

Same author

Economic evaluation of first-line treatments for metastatic renal cell carcinoma: a cost-effectiveness analysis in a health resource-limited setting.

PloS one·2012

Same author

Metabolism studies of casticin in rats using HPLC-ESI-MS(n).

Biomedical chromatography : BMC·2012

Same journal

Dynamic analysis and reliable mechanical optimization application of ring HNN effected with a memristive neuron.

Neural networks : the official journal of the International Neural Network Society·2026

Same journal

DAFF-Net: A detection and search method for small-scale low surface brightness galaxies.

Neural networks : the official journal of the International Neural Network Society·2026

Same journal

Quasi-synchronization for complex networks with hybrid pinning intermittent control.

Neural networks : the official journal of the International Neural Network Society·2026

Same journal

Physics-encoded convolutional neural operators for parametric PDEs: A convergence-guaranteed framework via pre-computed kernel fields.

Neural networks : the official journal of the International Neural Network Society·2026

Same journal

Exploiting audio-visual modalities in videos: Object detection via multi-stage bilateral coupling network.

Neural networks : the official journal of the International Neural Network Society·2026

Same journal

Reliability-aware modality completion with cross-modal distillation for federated learning with missing modalities.

Neural networks : the official journal of the International Neural Network Society·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Jul 15, 2025

Author Spotlight: Enhancement of Salient Object Detection for Smart Grid Applications

Author Spotlight: Enhancement of Salient Object Detection for Smart Grid Applications

Published on: December 15, 2023

A multi-scale self-supervised hypergraph contrastive learning framework for video question answering.

Zheng Wang¹, Bin Wu², Kaoru Ota³

¹Beijing Key Laboratory of Intelligent Telecommunications Software and Multimedia, Beijing University of Posts and Telecommunications, Beijing 100876, China; Muroran Institute of Technology, Muroran 050-8585, Japan.

Neural Networks : the Official Journal of the International Neural Network Society

|September 29, 2023

Summary

This summary is machine-generated.

This study introduces a new Multi-scale Self-supervised Hypergraph Contrastive Learning (MSHCL) framework to improve video question answering (VideoQA). The MSHCL model enhances accuracy by capturing complex object relationships and leveraging self-supervised signals for better video understanding.

Keywords:

Data augmentation High-order relations Hypergraph contrastive learning Multi-scale Video question answering

More Related Videos

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

Published on: May 7, 2019

Related Experiment Videos

Last Updated: Jul 15, 2025

Author Spotlight: Enhancement of Salient Object Detection for Smart Grid Applications

Author Spotlight: Enhancement of Salient Object Detection for Smart Grid Applications

Published on: December 15, 2023

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

Published on: May 7, 2019

Area of Science:

Artificial Intelligence
Computer Vision
Machine Learning

Background:

Video question answering (VideoQA) requires understanding multimodal information and object interactions.
Existing Graph Neural Network (GNN) models for VideoQA struggle with capturing high-order relations and leveraging self-supervised signals.

Purpose of the Study:

To propose a novel Multi-scale Self-supervised Hypergraph Contrastive Learning (MSHCL) framework for enhanced VideoQA.
To address limitations of existing GNN-based methods in capturing complex, high-order object relationships and utilizing self-supervised learning signals.

Main Methods:

Constructing a multi-scale temporal-spatial hypergraph to directly model high-order object relations using appearance and motion hyperedges.
Integrating hypergraph convolution features with a Transformer for global sequence information capture.
Employing a self-supervised hypergraph contrastive learning task with data augmentation and a question-guided multimodal interaction module.

Main Results:

The proposed MSHCL framework demonstrates superior performance compared to state-of-the-art methods on three benchmark VideoQA datasets.
The model effectively captures high-order relations among multiple objects, overcoming limitations of traditional GNNs.
Self-supervised learning signals within the hypergraph structure significantly enhance accuracy and robustness.

Conclusions:

The MSHCL framework offers a more effective approach to VideoQA by directly modeling high-order relations and utilizing multi-scale self-supervised learning.
This method advances video understanding by improving the capture of complex temporal-spatial interactions and object semantics.
The findings suggest a promising direction for future research in multimodal understanding and question answering.