Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Search research articles

Related Experiment Video

Updated: May 21, 2026

Application of Deep Learning-Based Medical Image Segmentation via Orbital Computed Tomography

Application of Deep Learning-Based Medical Image Segmentation via Orbital Computed Tomography

Published on: November 30, 2022

MALFM-Captioner: A Multipath Alignment Learning for Image Captioning With Feature Mask.

Xiaobao Yang, Bohui Song, Yizhuo Dong

IEEE Transactions on Neural Networks and Learning Systems

|May 19, 2026

Summary

This summary is machine-generated.

Related Concept Videos

Masking and Demasking Agents

Masking and Demasking Agents

EDTA titrations may necessitate masking and demasking agents to temporarily protect a particular metal ion in a mixture from the EDTA reaction. These agents facilitate the sequential analysis of the metal ions by forming stable complexes with some—but not all—metal ions during certain steps.
There are many masking agents, such as cyanide, fluoride, triethanolamine, thiourea, and 2,3-bis(sulfanyl)propan-1-ol (formerly 2,3-dimercapto-1-propanol), with the masking agent chosen based on the metal...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Two-Dimensional Chiral Perovskites for Integrated High-Performance Ultraviolet Full-Stokes Polarization Detection.

Journal of the American Chemical Society·2026

Same author

Compact high-Q multimode InGaAsP/InP microring resonators enabled by mode-selective excitation.

Optics express·2026

Same author

Socioeconomic disparities and health-related parenting practices in shaping early infant neurodevelopment: evidence from a Chinese prospective birth cohort.

BMC pregnancy and childbirth·2026

Same author

Association of borderline hypertension defined by ACC-AHA diagnostic criteria during pregnancy with neurodevelopment in infants.

BMC medicine·2026

Same author

superLPNet: a super lightweight parameter deep learning model for brain age estimation from structural MRI.

Magma (New York, N.Y.)·2026

Same author

Effects of Extracellular Resistance on Neuronal Sensitivity Under Weak Alternating Electric Field Stimulation: A Computational Study.

Biomimetics (Basel, Switzerland)·2026

Same journal

Hidden Data Recovery and Forecasting via Next-Generation Reservoir Computing With Multiscale Delay Selection.

IEEE transactions on neural networks and learning systems·2026

Same journal

CAFF-CIL: Causality-Aware Freedom Forgetting Approach for Class-Incremental Learning.

IEEE transactions on neural networks and learning systems·2026

Same journal

Harmonic Autoencoding Framework for Multiple Tasks in Magnetic Particle Imaging Reconstruction.

IEEE transactions on neural networks and learning systems·2026

Same journal

A Survey on Human-Centric Voice-Face Multimodal Learning.

IEEE transactions on neural networks and learning systems·2026

Same journal

Vision-Assisted Foundation Model for Solving Multitask Vehicle Routing Problems.

IEEE transactions on neural networks and learning systems·2026

Same journal

FP3O: Enabling Proximal Policy Optimization in Multiagent Cooperation With Parameter-Sharing Versatility.

IEEE transactions on neural networks and learning systems·2026

See all related articles

This study introduces a novel multipath alignment learning for image captioning with feature mask (MALFM-Captioner) method. It enhances image-text feature alignment in diffusion models, improving captioning accuracy and semantic fidelity.

Area of Science:

Computer Vision
Artificial Intelligence
Natural Language Processing

Background:

Diffusion models offer advantages over autoregressive methods in image captioning by avoiding token dependency.
However, noise in diffusion models can degrade sentence information and lead to poor image-text feature alignment.

Purpose of the Study:

To address the image-text misalignment issue in diffusion-based image captioning.
To enhance the discriminative visual representation learning and semantic fidelity of generated captions.

Main Methods:

Proposed a multipath alignment learning for image captioning with feature mask (MALFM-Captioner) method.
Introduced a feature masked module (FMM) to reconstruct masked visual information and enhance visual representations.
Implemented cross-attention between image and text features with weighted summation fusion and a gated feature fusion module (GFFM).

Related Experiment Videos

Last Updated: May 21, 2026

Application of Deep Learning-Based Medical Image Segmentation via Orbital Computed Tomography

Application of Deep Learning-Based Medical Image Segmentation via Orbital Computed Tomography

Published on: November 30, 2022

Main Results:

MALFM-Captioner achieved 0.9% and 1.9% improvements in Bleu-4 and CIDEr metrics on MS COCO and Flickr 30K datasets.
Demonstrated competitive performance against state-of-the-art models like DDCap and Bit Diffusion.
Effectively mitigated image-text misalignment and improved caption accuracy and semantic fidelity.

Conclusions:

The proposed MALFM-Captioner method effectively enhances image-text feature alignment in diffusion-based image captioning.
The model shows superior performance and semantic fidelity compared to existing state-of-the-art approaches.
MALFM-Captioner offers a promising direction for improving diffusion-based image captioning systems.