Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Experiment Video

Updated: May 21, 2026

Application of Deep Learning-Based Medical Image Segmentation via Orbital Computed Tomography
04:48

Application of Deep Learning-Based Medical Image Segmentation via Orbital Computed Tomography

Published on: November 30, 2022

MALFM-Captioner: A Multipath Alignment Learning for Image Captioning With Feature Mask.

Xiaobao Yang, Bohui Song, Yizhuo Dong

    IEEE Transactions on Neural Networks and Learning Systems
    |May 19, 2026
    PubMed
    Summary
    This summary is machine-generated.

    Related Concept Videos

    Masking and Demasking Agents01:19

    Masking and Demasking Agents

    EDTA titrations may necessitate masking and demasking agents to temporarily protect a particular metal ion in a mixture from the EDTA reaction. These agents facilitate the sequential analysis of the metal ions by forming stable complexes with some—but not all—metal ions during certain steps.
    There are many masking agents, such as cyanide, fluoride, triethanolamine, thiourea, and 2,3-bis(sulfanyl)propan-1-ol (formerly 2,3-dimercapto-1-propanol), with the masking agent chosen based on the metal...

    You might also read

    Related Articles

    Articles linked to this work by shared authors, journal, and citation graph.

    Sort by
    Same author

    Two-Dimensional Chiral Perovskites for Integrated High-Performance Ultraviolet Full-Stokes Polarization Detection.

    Journal of the American Chemical Society·2026
    Same author

    Compact high-Q multimode InGaAsP/InP microring resonators enabled by mode-selective excitation.

    Optics express·2026
    Same author

    Socioeconomic disparities and health-related parenting practices in shaping early infant neurodevelopment: evidence from a Chinese prospective birth cohort.

    BMC pregnancy and childbirth·2026
    Same author

    Association of borderline hypertension defined by ACC-AHA diagnostic criteria during pregnancy with neurodevelopment in infants.

    BMC medicine·2026
    Same author

    superLPNet: a super lightweight parameter deep learning model for brain age estimation from structural MRI.

    Magma (New York, N.Y.)·2026
    Same author

    Effects of Extracellular Resistance on Neuronal Sensitivity Under Weak Alternating Electric Field Stimulation: A Computational Study.

    Biomimetics (Basel, Switzerland)·2026
    Same journal

    Hidden Data Recovery and Forecasting via Next-Generation Reservoir Computing With Multiscale Delay Selection.

    IEEE transactions on neural networks and learning systems·2026
    Same journal

    CAFF-CIL: Causality-Aware Freedom Forgetting Approach for Class-Incremental Learning.

    IEEE transactions on neural networks and learning systems·2026
    Same journal

    Harmonic Autoencoding Framework for Multiple Tasks in Magnetic Particle Imaging Reconstruction.

    IEEE transactions on neural networks and learning systems·2026
    Same journal

    A Survey on Human-Centric Voice-Face Multimodal Learning.

    IEEE transactions on neural networks and learning systems·2026
    Same journal

    Vision-Assisted Foundation Model for Solving Multitask Vehicle Routing Problems.

    IEEE transactions on neural networks and learning systems·2026
    Same journal

    FP3O: Enabling Proximal Policy Optimization in Multiagent Cooperation With Parameter-Sharing Versatility.

    IEEE transactions on neural networks and learning systems·2026
    See all related articles

    This study introduces a novel multipath alignment learning for image captioning with feature mask (MALFM-Captioner) method. It enhances image-text feature alignment in diffusion models, improving captioning accuracy and semantic fidelity.

    Area of Science:

    • Computer Vision
    • Artificial Intelligence
    • Natural Language Processing

    Background:

    • Diffusion models offer advantages over autoregressive methods in image captioning by avoiding token dependency.
    • However, noise in diffusion models can degrade sentence information and lead to poor image-text feature alignment.

    Purpose of the Study:

    • To address the image-text misalignment issue in diffusion-based image captioning.
    • To enhance the discriminative visual representation learning and semantic fidelity of generated captions.

    Main Methods:

    • Proposed a multipath alignment learning for image captioning with feature mask (MALFM-Captioner) method.
    • Introduced a feature masked module (FMM) to reconstruct masked visual information and enhance visual representations.
    • Implemented cross-attention between image and text features with weighted summation fusion and a gated feature fusion module (GFFM).

    Related Experiment Videos

    Last Updated: May 21, 2026

    Application of Deep Learning-Based Medical Image Segmentation via Orbital Computed Tomography
    04:48

    Application of Deep Learning-Based Medical Image Segmentation via Orbital Computed Tomography

    Published on: November 30, 2022

    Main Results:

    • MALFM-Captioner achieved 0.9% and 1.9% improvements in Bleu-4 and CIDEr metrics on MS COCO and Flickr 30K datasets.
    • Demonstrated competitive performance against state-of-the-art models like DDCap and Bit Diffusion.
    • Effectively mitigated image-text misalignment and improved caption accuracy and semantic fidelity.

    Conclusions:

    • The proposed MALFM-Captioner method effectively enhances image-text feature alignment in diffusion-based image captioning.
    • The model shows superior performance and semantic fidelity compared to existing state-of-the-art approaches.
    • MALFM-Captioner offers a promising direction for improving diffusion-based image captioning systems.