Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Observational Learning01:12

Observational Learning

202
Albert Bandura's observational learning, also known as imitation or modeling, occurs when a person observes and imitates another's behavior. It is a quicker process than operant conditioning. A well-known example is the Bobo doll study, where children who saw an adult acting aggressively towards the doll were more likely to act aggressively when left alone, compared to those who observed a nonaggressive adult. Many psychologists view observational learning as a form of latent learning...
202
Introduction to Learning01:18

Introduction to Learning

460
Learning is the process of acquiring knowledge or skills through practice or experience, leading to long-lasting behavioral changes. This acquisition occurs through interaction with the environment and requires practice or experience. For instance, mastering a skill such as surfing requires considerable practice and experience, highlighting the essential role of repeated interactions with the environment in learning.
In contrast to learned behaviors, unlearned behaviors such as crying, sexual...
460
Associative Learning01:27

Associative Learning

428
Associative learning is a fundamental concept in behavioral psychology, wherein a connection is established between two stimuli or events, leading to a learned response. This process is critical in understanding how behaviors are acquired and modified. Conditioning, the mechanism through which associations are formed, can be divided into two main types: classical conditioning and operant conditioning, each elucidating different aspects of associative learning.
Classical conditioning, also known...
428
Multi-input and Multi-variable systems01:22

Multi-input and Multi-variable systems

122
Cruise control systems in cars are designed as multi-input systems to maintain a driver's desired speed while compensating for external disturbances such as changes in terrain. The block diagram for a cruise control system typically includes two main inputs: the desired speed set by the driver and any external disturbances, such as the incline of the road. By adjusting the engine throttle, the system maintains the vehicle's speed as close to the desired value as possible.
In the absence...
122
Cognitive Learning01:21

Cognitive Learning

278
Cognitive learning is based on purposive behavior, incidental learning, and insight learning.
E. C. Tolman's theory of purposive behavior emphasizes that much behavior is goal-directed. He argued that to understand behavior, we must look at the entire sequence of actions leading to a goal. For instance, high school students study hard, not just due to past reinforcement but also to achieve the goal of getting into a good college.
Tolman introduced the idea that behavior is influenced by...
278
Hypothesis: Accept or Fail to Reject?01:17

Hypothesis: Accept or Fail to Reject?

27.9K
The outcome of any hypothesis testing leads to rejecting or not rejecting the null hypothesis. This decision is taken based on the analysis of the data, an appropriate test statistic, an appropriate confidence level, the critical values, and P-values. However, when the evidence suggests that the null hypothesis cannot be rejected, is it right to say, 'Accept' the null hypothesis?
There are two ways to indicate that the null hypothesis is not rejected. 'Accept' the null...
27.9K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

The endoplasmic reticulum is a target organelle for trivalent dimethylarsinic acid (DMAIII)-induced cytotoxicity.

Toxicology and applied pharmacology·2012
Same author

(E)-1-{4-[Bis(4-bromo-phen-yl)meth-yl]piperazin-1-yl}-3-(4-eth-oxy-phen-yl)prop-2-en-1-one.

Acta crystallographica. Section E, Structure reports online·2012
Same author

(E)-1-{4-[Bis(4-bromo-phen-yl)meth-yl]piperazin-1-yl}-3-(4-methyl-phen-yl)prop-2-en-1-one.

Acta crystallographica. Section E, Structure reports online·2012
Same author

(E)-3-(1,3-Benzodioxol-5-yl)-1-{4-[bis-(4-meth-oxy-phen-yl)meth-yl]piperazin-1-yl}prop-2-en-1-one.

Acta crystallographica. Section E, Structure reports online·2012
Same author

Economic evaluation of first-line treatments for metastatic renal cell carcinoma: a cost-effectiveness analysis in a health resource-limited setting.

PloS one·2012
Same author

Metabolism studies of casticin in rats using HPLC-ESI-MS(n).

Biomedical chromatography : BMC·2012
Same journal

Dynamic analysis and reliable mechanical optimization application of ring HNN effected with a memristive neuron.

Neural networks : the official journal of the International Neural Network Society·2026
Same journal

DAFF-Net: A detection and search method for small-scale low surface brightness galaxies.

Neural networks : the official journal of the International Neural Network Society·2026
Same journal

Quasi-synchronization for complex networks with hybrid pinning intermittent control.

Neural networks : the official journal of the International Neural Network Society·2026
Same journal

Physics-encoded convolutional neural operators for parametric PDEs: A convergence-guaranteed framework via pre-computed kernel fields.

Neural networks : the official journal of the International Neural Network Society·2026
Same journal

Exploiting audio-visual modalities in videos: Object detection via multi-stage bilateral coupling network.

Neural networks : the official journal of the International Neural Network Society·2026
Same journal

Reliability-aware modality completion with cross-modal distillation for federated learning with missing modalities.

Neural networks : the official journal of the International Neural Network Society·2026
See all related articles

Related Experiment Video

Updated: Jul 15, 2025

Author Spotlight: Enhancement of Salient Object Detection for Smart Grid Applications
03:31

Author Spotlight: Enhancement of Salient Object Detection for Smart Grid Applications

Published on: December 15, 2023

568

A multi-scale self-supervised hypergraph contrastive learning framework for video question answering.

Zheng Wang1, Bin Wu2, Kaoru Ota3

  • 1Beijing Key Laboratory of Intelligent Telecommunications Software and Multimedia, Beijing University of Posts and Telecommunications, Beijing 100876, China; Muroran Institute of Technology, Muroran 050-8585, Japan.

Neural Networks : the Official Journal of the International Neural Network Society
|September 29, 2023
PubMed
Summary
This summary is machine-generated.

This study introduces a new Multi-scale Self-supervised Hypergraph Contrastive Learning (MSHCL) framework to improve video question answering (VideoQA). The MSHCL model enhances accuracy by capturing complex object relationships and leveraging self-supervised signals for better video understanding.

Keywords:
Data augmentationHigh-order relationsHypergraph contrastive learningMulti-scaleVideo question answering

More Related Videos

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness
03:14

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

610
Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment
08:25

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

Published on: May 7, 2019

9.0K

Related Experiment Videos

Last Updated: Jul 15, 2025

Author Spotlight: Enhancement of Salient Object Detection for Smart Grid Applications
03:31

Author Spotlight: Enhancement of Salient Object Detection for Smart Grid Applications

Published on: December 15, 2023

568
Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness
03:14

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

610
Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment
08:25

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

Published on: May 7, 2019

9.0K

Area of Science:

  • Artificial Intelligence
  • Computer Vision
  • Machine Learning

Background:

  • Video question answering (VideoQA) requires understanding multimodal information and object interactions.
  • Existing Graph Neural Network (GNN) models for VideoQA struggle with capturing high-order relations and leveraging self-supervised signals.

Purpose of the Study:

  • To propose a novel Multi-scale Self-supervised Hypergraph Contrastive Learning (MSHCL) framework for enhanced VideoQA.
  • To address limitations of existing GNN-based methods in capturing complex, high-order object relationships and utilizing self-supervised learning signals.

Main Methods:

  • Constructing a multi-scale temporal-spatial hypergraph to directly model high-order object relations using appearance and motion hyperedges.
  • Integrating hypergraph convolution features with a Transformer for global sequence information capture.
  • Employing a self-supervised hypergraph contrastive learning task with data augmentation and a question-guided multimodal interaction module.

Main Results:

  • The proposed MSHCL framework demonstrates superior performance compared to state-of-the-art methods on three benchmark VideoQA datasets.
  • The model effectively captures high-order relations among multiple objects, overcoming limitations of traditional GNNs.
  • Self-supervised learning signals within the hypergraph structure significantly enhance accuracy and robustness.

Conclusions:

  • The MSHCL framework offers a more effective approach to VideoQA by directly modeling high-order relations and utilizing multi-scale self-supervised learning.
  • This method advances video understanding by improving the capture of complex temporal-spatial interactions and object semantics.
  • The findings suggest a promising direction for future research in multimodal understanding and question answering.