Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Improving Translational Accuracy02:07

Improving Translational Accuracy

2.6K
2.6K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Optimization of Process Parameters of Rhamnolipid Treatment of Oily Sludge Based on Response Surface Methodology.

ACS omega·2020
Same author

Safety and Long-term Scleral Biomechanical Stability of Rhesus Eyes after Scleral Cross-linking by Blue Light.

Current eye research·2020
Same author

Serum pentraxin 3 as a biomarker for prognosis of acute minor stroke due to large artery atherosclerosis.

Brain and behavior·2020
Same author

The roles of adenosine deaminase in autoimmune diseases.

Autoimmunity reviews·2020
Same author

The role of oxidative stress in association between disinfection by-products exposure and semen quality: A mediation analysis among men from an infertility clinic.

Chemosphere·2020
Same author

Establishment of immune prognostic signature and analysis of prospective molecular mechanisms in childhood osteosarcoma patients.

Medicine·2020
Same journal

Change-Prior-Guided Unsupervised Change Detection of Heterogeneous Remote Sensing Images.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026
Same journal

AgonicDreamer: Enhancing Multi-View Consistency in Text-to-3D Generation via Rectified Score Distillation.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026
Same journal

BiCM-Prompt: Bidirectional Cross-Modal Prompt Tuning for Class-Incremental Learning on Multisource Remote Sensing Images.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026
Same journal

GoP-based Quality Enhancement on Video Compression.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026
Same journal

Align then Tensorize: Multi-Level Consistent Anchor Graph Learning for Scalable Multi-View Clustering.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026
Same journal

Beyond Fidelity: Diverse Image Synthesis via Retrieval-Augmented Diffusion.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026
See all related articles

Related Experiment Video

Updated: Jul 26, 2025

Author Spotlight: An Efficient and Robust Software for Automated Fusion of Multiple Preclinical Imaging Modalities
07:13

Author Spotlight: An Efficient and Robust Software for Automated Fusion of Multiple Preclinical Imaging Modalities

Published on: October 27, 2023

1.2K

Efficient Token-Guided Image-Text Retrieval With Consistent Multimodal Contrastive Training.

Chong Liu, Yuqi Zhang, Hongsong Wang

    IEEE Transactions on Image Processing : a Publication of the IEEE Signal Processing Society
    |June 20, 2023
    PubMed
    Summary
    This summary is machine-generated.

    This study introduces a unified framework for image-text retrieval, combining coarse- and fine-grained representations. The Token-Guided Dual Transformer (TGDT) architecture improves retrieval accuracy and efficiency.

    More Related Videos

    Eye Tracking During Visually Situated Language Comprehension: Flexibility and Limitations in Uncovering Visual Context Effects
    07:36

    Eye Tracking During Visually Situated Language Comprehension: Flexibility and Limitations in Uncovering Visual Context Effects

    Published on: November 30, 2018

    15.8K
    Development of a Gaze-Contingent Display Framework Designed for Perceptual and Oculomotor Research with Simulated Central Vision Loss
    07:12

    Development of a Gaze-Contingent Display Framework Designed for Perceptual and Oculomotor Research with Simulated Central Vision Loss

    Published on: April 11, 2025

    466

    Related Experiment Videos

    Last Updated: Jul 26, 2025

    Author Spotlight: An Efficient and Robust Software for Automated Fusion of Multiple Preclinical Imaging Modalities
    07:13

    Author Spotlight: An Efficient and Robust Software for Automated Fusion of Multiple Preclinical Imaging Modalities

    Published on: October 27, 2023

    1.2K
    Eye Tracking During Visually Situated Language Comprehension: Flexibility and Limitations in Uncovering Visual Context Effects
    07:36

    Eye Tracking During Visually Situated Language Comprehension: Flexibility and Limitations in Uncovering Visual Context Effects

    Published on: November 30, 2018

    15.8K
    Development of a Gaze-Contingent Display Framework Designed for Perceptual and Oculomotor Research with Simulated Central Vision Loss
    07:12

    Development of a Gaze-Contingent Display Framework Designed for Perceptual and Oculomotor Research with Simulated Central Vision Loss

    Published on: April 11, 2025

    466

    Area of Science:

    • Computer Science
    • Artificial Intelligence
    • Machine Learning

    Background:

    • Image-text retrieval is crucial for understanding semantic relationships between visual and linguistic data.
    • Existing methods often focus on either global or local features, neglecting their interplay, leading to suboptimal accuracy and high computational costs.

    Purpose of the Study:

    • To develop a novel framework that integrates coarse- and fine-grained representation learning for enhanced image-text retrieval.
    • To improve retrieval accuracy and reduce computational complexity in multimodal understanding tasks.

    Main Methods:

    • Proposed the Token-Guided Dual Transformer (TGDT) architecture with two homogeneous branches for image and text processing.
    • Introduced a Consistent Multimodal Contrastive (CMC) loss to ensure semantic consistency across modalities in a shared embedding space.
    • Implemented a two-stage inference method utilizing mixed global and local cross-modal similarity.

    Main Results:

    • Achieved state-of-the-art retrieval performance on benchmark datasets.
    • Demonstrated significantly lower inference time compared to existing representative methods.
    • The unified framework effectively leverages both coarse- and fine-grained information.

    Conclusions:

    • The proposed TGDT architecture offers a more effective and efficient approach to image-text retrieval by unifying multimodal representations.
    • The CMC loss and two-stage inference method contribute to superior semantic understanding and retrieval accuracy.
    • This work provides a new perspective on multimodal learning, aligning with human cognitive processes.