Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Mycoelectronics: Bioprinted living fungal bioelectronics for artificial sensation.

Proceedings of the National Academy of Sciences of the United States of America·2026
Same author

Uranyl-Bridged Polyoxometalate Assembled from Te-Oriented Three-Layered Cage Clusters for Photocatalytic Oxy-Thiocyanation of Alkenes.

Inorganic chemistry·2026
Same author

AWM-Fuse: Multi-Modality Image Fusion for Adverse Weather via Global and Local Text Perception.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026
Same author

Evolving classifiers with background suppression transformer for open-set long-tailed class-incremental remote sensing scene classification.

Neural networks : the official journal of the International Neural Network Society·2026
Same author

Early postnatal antibiotic-associated gut microbiota alterations might promote long-term lipid metabolism via brown adipose tissue metabolic programming.

Gut microbes·2026
Same author

Predictive Value of Nutritional Status in Sputum Culture Conversion Among Patients with Nontuberculous Mycobacterial Pulmonary Disease: A Retrospective Cohort Study.

Infection and drug resistance·2026
Same journal

Change-Prior-Guided Unsupervised Change Detection of Heterogeneous Remote Sensing Images.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026
Same journal

AgonicDreamer: Enhancing Multi-View Consistency in Text-to-3D Generation via Rectified Score Distillation.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026
Same journal

BiCM-Prompt: Bidirectional Cross-Modal Prompt Tuning for Class-Incremental Learning on Multisource Remote Sensing Images.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026
Same journal

GoP-based Quality Enhancement on Video Compression.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026
Same journal

Align then Tensorize: Multi-Level Consistent Anchor Graph Learning for Scalable Multi-View Clustering.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026
Same journal

Beyond Fidelity: Diverse Image Synthesis via Retrieval-Augmented Diffusion.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026
See all related articles

Related Experiment Video

Updated: Jan 11, 2026

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment
08:25

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

Published on: May 7, 2019

9.5K

Disentangling Inter- and Intra-Video Relations for Multi-Event Video-Text Retrieval and Grounding.

Mengzhao Wang, Huafeng Li, Yafei Zhang

    IEEE Transactions on Image Processing : a Publication of the IEEE Signal Processing Society
    |November 14, 2025
    PubMed
    Summary
    This summary is machine-generated.

    This study introduces a new method for multi-event video-text retrieval and grounding, improving search accuracy for complex queries. It effectively locates multiple events within videos, overcoming limitations of existing single-event focused approaches.

    More Related Videos

    Eye Tracking During Visually Situated Language Comprehension: Flexibility and Limitations in Uncovering Visual Context Effects
    07:36

    Eye Tracking During Visually Situated Language Comprehension: Flexibility and Limitations in Uncovering Visual Context Effects

    Published on: November 30, 2018

    16.3K
    A Methodology for Capturing Joint Visual Attention Using Mobile Eye-Trackers
    12:39

    A Methodology for Capturing Joint Visual Attention Using Mobile Eye-Trackers

    Published on: January 18, 2020

    8.1K

    Related Experiment Videos

    Last Updated: Jan 11, 2026

    Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment
    08:25

    Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

    Published on: May 7, 2019

    9.5K
    Eye Tracking During Visually Situated Language Comprehension: Flexibility and Limitations in Uncovering Visual Context Effects
    07:36

    Eye Tracking During Visually Situated Language Comprehension: Flexibility and Limitations in Uncovering Visual Context Effects

    Published on: November 30, 2018

    16.3K
    A Methodology for Capturing Joint Visual Attention Using Mobile Eye-Trackers
    12:39

    A Methodology for Capturing Joint Visual Attention Using Mobile Eye-Trackers

    Published on: January 18, 2020

    8.1K

    Area of Science:

    • Computer Vision
    • Artificial Intelligence
    • Information Retrieval

    Background:

    • Current video-text retrieval methods struggle with multi-event queries and fail to locate specific events within videos.
    • Existing approaches are limited to single-text queries and do not provide precise event localization.
    • There is a need for methods that can handle complex, multi-event textual descriptions and ground these events in video content.

    Purpose of the Study:

    • To propose a novel method for jointly addressing multi-event video-text retrieval and grounding.
    • To enhance the precision of video-text alignment by considering inter- and intra-video event relationships.
    • To accurately locate and ground multiple events within retrieved videos based on textual queries.

    Main Methods:

    • Developed a Relational Event-Centric Video-Text Retrieval module utilizing hierarchical event relationships for multi-level contrastive learning.
    • Introduced Event Contrast-Driven Video Grounding to precisely locate multiple events by accounting for their positions on a temporal score map.
    • Leveraged both inter-video and intra-video event relationships to improve retrieval and grounding performance.

    Main Results:

    • The proposed method significantly outperforms existing approaches on the ActivityNet Captions and Charades-STA benchmark datasets.
    • Achieved superior performance in both multi-event video-text retrieval and precise event grounding.
    • Demonstrated the effectiveness of the joint framework in handling complex textual queries and localizing multiple events.

    Conclusions:

    • The novel joint framework effectively addresses limitations in multi-event video-text retrieval and grounding.
    • The method offers a significant advancement in searching and understanding video content based on complex textual descriptions.
    • This research provides a foundation for future advancements in video analysis and retrieval applications.