Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Chemistry-Informed Machine Learning Framework for Predicting Structural Properties in Osmabenzene Complexes.

The journal of physical chemistry letters·2026
Same author

Mask-Guided Self-Supervised Video Object Segmentation.

IEEE transactions on pattern analysis and machine intelligence·2026
Same author

Gut microbiota and bile acids profiles study of ulcerative colitis and Crohn's disease patients.

Frontiers in microbiology·2026
Same author

ChatLeafDisease: a chain-of-thought prompting approach for crop disease classification using large language models.

Plant phenomics (Washington, D.C.)·2025
Same author

SOX2 induces LPCAT1 expression to promote cholesterol metabolic reprogramming-mediated invasion and metastasis in osteosarcoma.

Frontiers in molecular biosciences·2025
Same author

Identification of routine blood derived hematological and lipid indices in ILD through machine learning; a retrospective case-control study.

Frontiers in medicine·2025
Same journal

Change-Prior-Guided Unsupervised Change Detection of Heterogeneous Remote Sensing Images.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026
Same journal

AgonicDreamer: Enhancing Multi-View Consistency in Text-to-3D Generation via Rectified Score Distillation.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026
Same journal

BiCM-Prompt: Bidirectional Cross-Modal Prompt Tuning for Class-Incremental Learning on Multisource Remote Sensing Images.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026
Same journal

GoP-based Quality Enhancement on Video Compression.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026
Same journal

Align then Tensorize: Multi-Level Consistent Anchor Graph Learning for Scalable Multi-View Clustering.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026
Same journal

Beyond Fidelity: Diverse Image Synthesis via Retrieval-Augmented Diffusion.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026
See all related articles

Related Experiment Video

Updated: May 24, 2025

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment
08:25

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

Published on: May 7, 2019

8.9K

Exploiting Unlabeled Videos for Video-Text Retrieval via Pseudo-Supervised Learning.

Yu Lu, Ruijie Quan, Linchao Zhu

    IEEE Transactions on Image Processing : a Publication of the IEEE Signal Processing Society
    |March 3, 2025
    PubMed
    Summary
    This summary is machine-generated.

    This study introduces Pseudo-Supervised Selective Contrastive Learning (PS-SCL) for video-text retrieval, reducing reliance on manual annotations. PS-SCL effectively trains models using automatically generated pseudo-texts and selective contrastive learning, improving performance on benchmarks.

    More Related Videos

    Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness
    03:14

    Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

    Published on: December 6, 2024

    475
    Defining the Role Of Language in Infants' Object Categorization with Eye-tracking Paradigms
    07:31

    Defining the Role Of Language in Infants' Object Categorization with Eye-tracking Paradigms

    Published on: February 8, 2019

    6.5K

    Related Experiment Videos

    Last Updated: May 24, 2025

    Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment
    08:25

    Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

    Published on: May 7, 2019

    8.9K
    Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness
    03:14

    Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

    Published on: December 6, 2024

    475
    Defining the Role Of Language in Infants' Object Categorization with Eye-tracking Paradigms
    07:31

    Defining the Role Of Language in Infants' Object Categorization with Eye-tracking Paradigms

    Published on: February 8, 2019

    6.5K

    Area of Science:

    • Computer Science
    • Artificial Intelligence
    • Machine Learning

    Background:

    • Large-scale pre-trained vision-language models like CLIP excel at video-text retrieval (VTR).
    • Traditional VTR methods require costly, labor-intensive manual annotation of video-text pairs.
    • Existing techniques often fine-tune models directly on clean, annotated data, limiting scalability.

    Purpose of the Study:

    • To develop a novel approach for video-text retrieval that minimizes dependency on manual text annotations.
    • To leverage unlabeled video data for training more efficient and scalable VTR models.
    • To enhance multi-modal learning under weak supervision conditions.

    Main Methods:

    • Introduced Pseudo-Supervised Selective Contrastive Learning (PS-SCL) to generate pseudo-supervisions from unlabeled video data.
    • Utilized CLIP's visual recognition to automatically generate pseudo-texts, providing weak textual guidance.
    • Developed Selective Contrastive Learning (SeLeCT) to prioritize and select highly correlated pseudo-supervised video-text pairs for effective multi-modal learning.

    Main Results:

    • PS-SCL significantly outperforms CLIP's zero-shot performance across multiple video-text retrieval benchmarks.
    • Achieved notable improvements, including 8.2% R@1 on MSRVTT, 12.2% R@1 on DiDeMo, and 10.9% R@1 on ActivityNet for video-to-text retrieval.
    • Demonstrated the effectiveness of pseudo-supervision and selective contrastive learning in weak pairing scenarios.

    Conclusions:

    • PS-SCL offers a scalable and effective alternative to traditional VTR methods that rely on extensive manual annotations.
    • The proposed approach successfully bridges the gap between visual content and textual descriptions using weak supervision.
    • This work advances the field of video-text retrieval by enabling robust multi-modal learning with reduced annotation costs.