Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Reamed and unreamed intramedullary nailing for the treatment of open and closed tibial fractures: a subgroup analysis of randomised trials.

International orthopaedics·2009

Same author

Selective COX-2 inhibitor versus nonselective COX-1 and COX-2 inhibitor in the prevention of heterotopic ossification after total hip arthroplasty: a meta-analysis of randomised trials.

International orthopaedics·2009

Same author

[Study on evaluating sex determining region of the Y as an engrafting track of BMSCs transplantation for repairing osteonecrosis of the femoral head of rabbit].

Zhongguo xiu fu chong jian wai ke za zhi = Zhongguo xiufu chongjian waike zazhi = Chinese journal of reparative and reconstructive surgery·2009

Same author

Positive association between benign familial infantile convulsions and LGI4.

Brain & development·2009

Same author

Catalytic enantioselective synthesis of chiral phthalides by efficient reductive cyclization of 2-acylarylcarboxylates under aqueous transfer hydrogenation conditions.

Organic letters·2009

Same author

Significance of urinary liver-fatty acid-binding protein in cardiac catheterization in patients with coronary artery disease.

Internal medicine (Tokyo, Japan)·2009

Same journal

Style-Aware Contrastive Test-Time Adaptation: A Dual-Cache Model for Robust Vision-Language Alignment.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

Same journal

Semantic Frame Interpolation.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

Same journal

Physics-Guided Cross-Modal Decoupling with Test-Time Adaptation for Hyperspectral Image Restoration.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

Same journal

Change-Prior-Guided Unsupervised Change Detection of Heterogeneous Remote Sensing Images.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

Same journal

AgonicDreamer: Enhancing Multi-View Consistency in Text-to-3D Generation via Rectified Score Distillation.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

Same journal

BiCM-Prompt: Bidirectional Cross-Modal Prompt Tuning for Class-Incremental Learning on Multisource Remote Sensing Images.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Oct 6, 2025

Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems

Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems

Published on: June 13, 2025

Cross-Attentional Spatio-Temporal Semantic Graph Networks for Video Question Answering.

Yun Liu, Xiaoming Zhang, Feiran Huang

IEEE Transactions on Image Processing : a Publication of the IEEE Signal Processing Society

|January 19, 2022

Summary

This summary is machine-generated.

This study introduces Cross-Attentional Spatio-Temporal Semantic Graph Networks (CASSG) for Video Question Answering (VideoQA). The novel model effectively integrates inter- and intra-modality correlations, outperforming existing methods.

More Related Videos

Eye Tracking During Visually Situated Language Comprehension: Flexibility and Limitations in Uncovering Visual Context Effects

Eye Tracking During Visually Situated Language Comprehension: Flexibility and Limitations in Uncovering Visual Context Effects

Published on: November 30, 2018

A Methodology for Capturing Joint Visual Attention Using Mobile Eye-Trackers

A Methodology for Capturing Joint Visual Attention Using Mobile Eye-Trackers

Published on: January 18, 2020

Related Experiment Videos

Last Updated: Oct 6, 2025

Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems

Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems

Published on: June 13, 2025

Eye Tracking During Visually Situated Language Comprehension: Flexibility and Limitations in Uncovering Visual Context Effects

Eye Tracking During Visually Situated Language Comprehension: Flexibility and Limitations in Uncovering Visual Context Effects

Published on: November 30, 2018

A Methodology for Capturing Joint Visual Attention Using Mobile Eye-Trackers

A Methodology for Capturing Joint Visual Attention Using Mobile Eye-Trackers

Published on: January 18, 2020

Area of Science:

Artificial Intelligence
Computer Vision
Natural Language Processing

Background:

Video Question Answering (VideoQA) is challenging due to complex spatio-temporal content and multimodal relations.
Existing methods use attention mechanisms but struggle to integrate inter- and intra-modality correlations uniformly.
Effective integration of multimodal information is crucial for improving VideoQA comprehension.

Purpose of the Study:

To propose a novel VideoQA model, Cross-Attentional Spatio-Temporal Semantic Graph Networks (CASSG).
To address the limitation of current methods in uniformly integrating inter- and intra-modality correlations.
To enhance the comprehension ability of VideoQA systems by exploring fine-grained interactions.

Main Methods:

A multi-head multi-hop attention module explores cross-modal interactions.
Heterogeneous graphs (multi-stream spatio-temporal semantic graphs) are constructed for synchronous reasoning of correlations.
A global and local information fusion method is employed to infer the final answer.

Main Results:

The proposed CASSG model demonstrates effectiveness on three public VideoQA datasets.
Experimental results show superior performance compared to state-of-the-art methods.
The model successfully integrates diverse correlations for improved VideoQA performance.

Conclusions:

The CASSG model offers a novel approach to VideoQA by effectively integrating multimodal information.
The proposed attention and graph-based methods enable synchronous reasoning of inter- and intra-modality correlations.
CASSG represents a significant advancement in VideoQA research, achieving state-of-the-art results.