Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Reamed and unreamed intramedullary nailing for the treatment of open and closed tibial fractures: a subgroup analysis of randomised trials.

International orthopaedics·2009
Same author

Selective COX-2 inhibitor versus nonselective COX-1 and COX-2 inhibitor in the prevention of heterotopic ossification after total hip arthroplasty: a meta-analysis of randomised trials.

International orthopaedics·2009
Same author

[Study on evaluating sex determining region of the Y as an engrafting track of BMSCs transplantation for repairing osteonecrosis of the femoral head of rabbit].

Zhongguo xiu fu chong jian wai ke za zhi = Zhongguo xiufu chongjian waike zazhi = Chinese journal of reparative and reconstructive surgery·2009
Same author

Positive association between benign familial infantile convulsions and LGI4.

Brain & development·2009
Same author

Catalytic enantioselective synthesis of chiral phthalides by efficient reductive cyclization of 2-acylarylcarboxylates under aqueous transfer hydrogenation conditions.

Organic letters·2009
Same author

Significance of urinary liver-fatty acid-binding protein in cardiac catheterization in patients with coronary artery disease.

Internal medicine (Tokyo, Japan)·2009
Same journal

Style-Aware Contrastive Test-Time Adaptation: A Dual-Cache Model for Robust Vision-Language Alignment.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026
Same journal

Semantic Frame Interpolation.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026
Same journal

Physics-Guided Cross-Modal Decoupling with Test-Time Adaptation for Hyperspectral Image Restoration.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026
Same journal

Change-Prior-Guided Unsupervised Change Detection of Heterogeneous Remote Sensing Images.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026
Same journal

AgonicDreamer: Enhancing Multi-View Consistency in Text-to-3D Generation via Rectified Score Distillation.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026
Same journal

BiCM-Prompt: Bidirectional Cross-Modal Prompt Tuning for Class-Incremental Learning on Multisource Remote Sensing Images.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026
See all related articles

Related Experiment Video

Updated: Oct 6, 2025

Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems
05:47

Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems

Published on: June 13, 2025

658

Cross-Attentional Spatio-Temporal Semantic Graph Networks for Video Question Answering.

Yun Liu, Xiaoming Zhang, Feiran Huang

    IEEE Transactions on Image Processing : a Publication of the IEEE Signal Processing Society
    |January 19, 2022
    PubMed
    Summary
    This summary is machine-generated.

    This study introduces Cross-Attentional Spatio-Temporal Semantic Graph Networks (CASSG) for Video Question Answering (VideoQA). The novel model effectively integrates inter- and intra-modality correlations, outperforming existing methods.

    More Related Videos

    Eye Tracking During Visually Situated Language Comprehension: Flexibility and Limitations in Uncovering Visual Context Effects
    07:36

    Eye Tracking During Visually Situated Language Comprehension: Flexibility and Limitations in Uncovering Visual Context Effects

    Published on: November 30, 2018

    15.9K
    A Methodology for Capturing Joint Visual Attention Using Mobile Eye-Trackers
    12:39

    A Methodology for Capturing Joint Visual Attention Using Mobile Eye-Trackers

    Published on: January 18, 2020

    7.8K

    Related Experiment Videos

    Last Updated: Oct 6, 2025

    Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems
    05:47

    Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems

    Published on: June 13, 2025

    658
    Eye Tracking During Visually Situated Language Comprehension: Flexibility and Limitations in Uncovering Visual Context Effects
    07:36

    Eye Tracking During Visually Situated Language Comprehension: Flexibility and Limitations in Uncovering Visual Context Effects

    Published on: November 30, 2018

    15.9K
    A Methodology for Capturing Joint Visual Attention Using Mobile Eye-Trackers
    12:39

    A Methodology for Capturing Joint Visual Attention Using Mobile Eye-Trackers

    Published on: January 18, 2020

    7.8K

    Area of Science:

    • Artificial Intelligence
    • Computer Vision
    • Natural Language Processing

    Background:

    • Video Question Answering (VideoQA) is challenging due to complex spatio-temporal content and multimodal relations.
    • Existing methods use attention mechanisms but struggle to integrate inter- and intra-modality correlations uniformly.
    • Effective integration of multimodal information is crucial for improving VideoQA comprehension.

    Purpose of the Study:

    • To propose a novel VideoQA model, Cross-Attentional Spatio-Temporal Semantic Graph Networks (CASSG).
    • To address the limitation of current methods in uniformly integrating inter- and intra-modality correlations.
    • To enhance the comprehension ability of VideoQA systems by exploring fine-grained interactions.

    Main Methods:

    • A multi-head multi-hop attention module explores cross-modal interactions.
    • Heterogeneous graphs (multi-stream spatio-temporal semantic graphs) are constructed for synchronous reasoning of correlations.
    • A global and local information fusion method is employed to infer the final answer.

    Main Results:

    • The proposed CASSG model demonstrates effectiveness on three public VideoQA datasets.
    • Experimental results show superior performance compared to state-of-the-art methods.
    • The model successfully integrates diverse correlations for improved VideoQA performance.

    Conclusions:

    • The CASSG model offers a novel approach to VideoQA by effectively integrating multimodal information.
    • The proposed attention and graph-based methods enable synchronous reasoning of inter- and intra-modality correlations.
    • CASSG represents a significant advancement in VideoQA research, achieving state-of-the-art results.