Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Echinococcus multilocularis serine protease inhibitor 1 (EmSPI-1): a highly effective serodiagnostic antigen for alveolar echinococcosis.

Clinica chimica acta; international journal of clinical chemistry·2026

Same author

Riemannian Implicit Differentiation via a Fixed-Point Equation for Riemannian Bilevel Optimization.

IEEE transactions on neural networks and learning systems·2025

Same author

End-to-End Open-Vocabulary Video Visual Relationship Detection Using Multi-Modal Prompting.

IEEE transactions on pattern analysis and machine intelligence·2025

Same author

The genetic variation of mitochondrial sequences and pathological differences of <i>Echinococcus multilocularis</i> strains from different continents.

Microbiology spectrum·2025

Same author

Temperature has an enhanced role in sediment N<sub>2</sub>O and N<sub>2</sub> fluxes in wider rivers.

Water research·2025

Same author

Drug repurposing for hard-to-treat human alveolar echinococcosis: pyronaridine and beyond.

Parasitology·2024

Same journal

Change-Prior-Guided Unsupervised Change Detection of Heterogeneous Remote Sensing Images.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

Same journal

AgonicDreamer: Enhancing Multi-View Consistency in Text-to-3D Generation via Rectified Score Distillation.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

Same journal

BiCM-Prompt: Bidirectional Cross-Modal Prompt Tuning for Class-Incremental Learning on Multisource Remote Sensing Images.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

Same journal

GoP-based Quality Enhancement on Video Compression.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

Same journal

Align then Tensorize: Multi-Level Consistent Anchor Graph Learning for Scalable Multi-View Clustering.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

Same journal

Beyond Fidelity: Diverse Image Synthesis via Retrieval-Augmented Diffusion.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Aug 4, 2025

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

Adaptive Latent Graph Representation Learning for Image-Text Matching.

Mengxiao Tian, Xinxiao Wu, Yunde Jia

IEEE Transactions on Image Processing : a Publication of the IEEE Signal Processing Society

|April 4, 2023

Summary

This summary is machine-generated.

This study introduces an adaptive latent graph method to improve image-text matching by reducing distractions. The approach enhances common embedding spaces for better cross-modal understanding.

More Related Videos

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

Published on: May 7, 2019

Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems

Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems

Published on: June 13, 2025

Related Experiment Videos

Last Updated: Aug 4, 2025

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

Published on: May 7, 2019

Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems

Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems

Published on: June 13, 2025

Area of Science:

Computer Vision
Natural Language Processing
Machine Learning

Background:

Image-text matching faces challenges due to the modality gap.
Existing methods modeling entity relationships are susceptible to irrelevant visual and textual information.
Distractions in entity relationships hinder the learning of effective common embedding spaces.

Purpose of the Study:

To propose an adaptive latent graph representation learning method for image-text matching.
To reduce distractions from irrelevant entities in images and noisy words in text.
To narrow the modality gap and improve matching performance.

Main Methods:

Utilized an improved graph variational autoencoder to disentangle distracting factors from latent relationship factors.
Jointly learned latent textual graph representations, latent visual graph representations, and a visual-textual graph embedding space.
Introduced an adaptive cross-attention mechanism for feature attending on latent graph representations across modalities.

Main Results:

Demonstrated significant effectiveness of the proposed method on the Flickr30K and COCO datasets.
The adaptive latent graph approach successfully reduced distractions from irrelevant visual and textual elements.
The adaptive cross-attention mechanism further enhanced feature alignment between image and text modalities.

Conclusions:

The proposed adaptive latent graph representation learning method effectively addresses distractions in image-text matching.
The method successfully narrows the modality gap, leading to improved matching performance.
This approach offers a promising direction for future research in cross-modal retrieval and understanding.