Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Chemistry-Informed Machine Learning Framework for Predicting Structural Properties in Osmabenzene Complexes.

The journal of physical chemistry letters·2026

Same author

Mask-Guided Self-Supervised Video Object Segmentation.

IEEE transactions on pattern analysis and machine intelligence·2026

Same author

Gut microbiota and bile acids profiles study of ulcerative colitis and Crohn's disease patients.

Frontiers in microbiology·2026

Same author

ChatLeafDisease: a chain-of-thought prompting approach for crop disease classification using large language models.

Plant phenomics (Washington, D.C.)·2025

Same author

SOX2 induces LPCAT1 expression to promote cholesterol metabolic reprogramming-mediated invasion and metastasis in osteosarcoma.

Frontiers in molecular biosciences·2025

Same author

Identification of routine blood derived hematological and lipid indices in ILD through machine learning; a retrospective case-control study.

Frontiers in medicine·2025

Same journal

Change-Prior-Guided Unsupervised Change Detection of Heterogeneous Remote Sensing Images.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

Same journal

AgonicDreamer: Enhancing Multi-View Consistency in Text-to-3D Generation via Rectified Score Distillation.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

Same journal

BiCM-Prompt: Bidirectional Cross-Modal Prompt Tuning for Class-Incremental Learning on Multisource Remote Sensing Images.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

Same journal

GoP-based Quality Enhancement on Video Compression.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

Same journal

Align then Tensorize: Multi-Level Consistent Anchor Graph Learning for Scalable Multi-View Clustering.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

Same journal

Beyond Fidelity: Diverse Image Synthesis via Retrieval-Augmented Diffusion.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

See all related articles

Search research articles

Related Experiment Video

Updated: May 24, 2025

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

Published on: May 7, 2019

Exploiting Unlabeled Videos for Video-Text Retrieval via Pseudo-Supervised Learning.

Yu Lu, Ruijie Quan, Linchao Zhu

IEEE Transactions on Image Processing : a Publication of the IEEE Signal Processing Society

|March 3, 2025

Summary

This summary is machine-generated.

This study introduces Pseudo-Supervised Selective Contrastive Learning (PS-SCL) for video-text retrieval, reducing reliance on manual annotations. PS-SCL effectively trains models using automatically generated pseudo-texts and selective contrastive learning, improving performance on benchmarks.

More Related Videos

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

Defining the Role Of Language in Infants' Object Categorization with Eye-tracking Paradigms

Defining the Role Of Language in Infants' Object Categorization with Eye-tracking Paradigms

Published on: February 8, 2019

Related Experiment Videos

Last Updated: May 24, 2025

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

Published on: May 7, 2019

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

Defining the Role Of Language in Infants' Object Categorization with Eye-tracking Paradigms

Defining the Role Of Language in Infants' Object Categorization with Eye-tracking Paradigms

Published on: February 8, 2019

Area of Science:

Computer Science
Artificial Intelligence
Machine Learning

Background:

Large-scale pre-trained vision-language models like CLIP excel at video-text retrieval (VTR).
Traditional VTR methods require costly, labor-intensive manual annotation of video-text pairs.
Existing techniques often fine-tune models directly on clean, annotated data, limiting scalability.

Purpose of the Study:

To develop a novel approach for video-text retrieval that minimizes dependency on manual text annotations.
To leverage unlabeled video data for training more efficient and scalable VTR models.
To enhance multi-modal learning under weak supervision conditions.

Main Methods:

Introduced Pseudo-Supervised Selective Contrastive Learning (PS-SCL) to generate pseudo-supervisions from unlabeled video data.
Utilized CLIP's visual recognition to automatically generate pseudo-texts, providing weak textual guidance.
Developed Selective Contrastive Learning (SeLeCT) to prioritize and select highly correlated pseudo-supervised video-text pairs for effective multi-modal learning.

Main Results:

PS-SCL significantly outperforms CLIP's zero-shot performance across multiple video-text retrieval benchmarks.
Achieved notable improvements, including 8.2% R@1 on MSRVTT, 12.2% R@1 on DiDeMo, and 10.9% R@1 on ActivityNet for video-to-text retrieval.
Demonstrated the effectiveness of pseudo-supervision and selective contrastive learning in weak pairing scenarios.

Conclusions:

PS-SCL offers a scalable and effective alternative to traditional VTR methods that rely on extensive manual annotations.
The proposed approach successfully bridges the gap between visual content and textual descriptions using weak supervision.
This work advances the field of video-text retrieval by enabling robust multi-modal learning with reduced annotation costs.