Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Mycoelectronics: Bioprinted living fungal bioelectronics for artificial sensation.

Proceedings of the National Academy of Sciences of the United States of America·2026

Same author

Uranyl-Bridged Polyoxometalate Assembled from Te-Oriented Three-Layered Cage Clusters for Photocatalytic Oxy-Thiocyanation of Alkenes.

Inorganic chemistry·2026

Same author

AWM-Fuse: Multi-Modality Image Fusion for Adverse Weather via Global and Local Text Perception.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

Same author

Evolving classifiers with background suppression transformer for open-set long-tailed class-incremental remote sensing scene classification.

Neural networks : the official journal of the International Neural Network Society·2026

Same author

Early postnatal antibiotic-associated gut microbiota alterations might promote long-term lipid metabolism via brown adipose tissue metabolic programming.

Gut microbes·2026

Same author

Predictive Value of Nutritional Status in Sputum Culture Conversion Among Patients with Nontuberculous Mycobacterial Pulmonary Disease: A Retrospective Cohort Study.

Infection and drug resistance·2026

Same journal

Change-Prior-Guided Unsupervised Change Detection of Heterogeneous Remote Sensing Images.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

Same journal

AgonicDreamer: Enhancing Multi-View Consistency in Text-to-3D Generation via Rectified Score Distillation.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

Same journal

BiCM-Prompt: Bidirectional Cross-Modal Prompt Tuning for Class-Incremental Learning on Multisource Remote Sensing Images.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

Same journal

GoP-based Quality Enhancement on Video Compression.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

Same journal

Align then Tensorize: Multi-Level Consistent Anchor Graph Learning for Scalable Multi-View Clustering.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

Same journal

Beyond Fidelity: Diverse Image Synthesis via Retrieval-Augmented Diffusion.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Jan 11, 2026

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

Published on: May 7, 2019

Disentangling Inter- and Intra-Video Relations for Multi-Event Video-Text Retrieval and Grounding.

Mengzhao Wang, Huafeng Li, Yafei Zhang

IEEE Transactions on Image Processing : a Publication of the IEEE Signal Processing Society

|November 14, 2025

Summary

This summary is machine-generated.

This study introduces a new method for multi-event video-text retrieval and grounding, improving search accuracy for complex queries. It effectively locates multiple events within videos, overcoming limitations of existing single-event focused approaches.

More Related Videos

Eye Tracking During Visually Situated Language Comprehension: Flexibility and Limitations in Uncovering Visual Context Effects

Eye Tracking During Visually Situated Language Comprehension: Flexibility and Limitations in Uncovering Visual Context Effects

Published on: November 30, 2018

A Methodology for Capturing Joint Visual Attention Using Mobile Eye-Trackers

A Methodology for Capturing Joint Visual Attention Using Mobile Eye-Trackers

Published on: January 18, 2020

Related Experiment Videos

Last Updated: Jan 11, 2026

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

Published on: May 7, 2019

Eye Tracking During Visually Situated Language Comprehension: Flexibility and Limitations in Uncovering Visual Context Effects

Eye Tracking During Visually Situated Language Comprehension: Flexibility and Limitations in Uncovering Visual Context Effects

Published on: November 30, 2018

A Methodology for Capturing Joint Visual Attention Using Mobile Eye-Trackers

A Methodology for Capturing Joint Visual Attention Using Mobile Eye-Trackers

Published on: January 18, 2020

Area of Science:

Computer Vision
Artificial Intelligence
Information Retrieval

Background:

Current video-text retrieval methods struggle with multi-event queries and fail to locate specific events within videos.
Existing approaches are limited to single-text queries and do not provide precise event localization.
There is a need for methods that can handle complex, multi-event textual descriptions and ground these events in video content.

Purpose of the Study:

To propose a novel method for jointly addressing multi-event video-text retrieval and grounding.
To enhance the precision of video-text alignment by considering inter- and intra-video event relationships.
To accurately locate and ground multiple events within retrieved videos based on textual queries.

Main Methods:

Developed a Relational Event-Centric Video-Text Retrieval module utilizing hierarchical event relationships for multi-level contrastive learning.
Introduced Event Contrast-Driven Video Grounding to precisely locate multiple events by accounting for their positions on a temporal score map.
Leveraged both inter-video and intra-video event relationships to improve retrieval and grounding performance.

Main Results:

The proposed method significantly outperforms existing approaches on the ActivityNet Captions and Charades-STA benchmark datasets.
Achieved superior performance in both multi-event video-text retrieval and precise event grounding.
Demonstrated the effectiveness of the joint framework in handling complex textual queries and localizing multiple events.

Conclusions:

The novel joint framework effectively addresses limitations in multi-event video-text retrieval and grounding.
The method offers a significant advancement in searching and understanding video content based on complex textual descriptions.
This research provides a foundation for future advancements in video analysis and retrieval applications.