Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

VALD-3 Induces GSDME-Dependent Pyroptosis via ROS/JNK/Bax Pathway in Triple-Negative Breast Cancer Cells.

Biochemical genetics·2026

Same author

Functional Ultrasound Localization Microscopy on Freely Moving Rats.

Research square·2026

Same author

Microbubble Track-based Functional Ultrasound Localization Microscopy in Awake Mice.

IEEE transactions on medical imaging·2026

Same author

Noninvasive whole-brain imaging of glymphatic dynamics.

Science advances·2026

Same author

Longitudinal Awake Mouse Brain Imaging Using Functional Ultrasound and Functional Ultrasound Localization Microscopy.

bioRxiv : the preprint server for biology·2026

Same author

Synergistic Site Engineering in Trimetallic Spinel Sulfides for High-Loading Lithium-Sulfur Batteries.

ACS nano·2026

Same journal

Change-Prior-Guided Unsupervised Change Detection of Heterogeneous Remote Sensing Images.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

Same journal

AgonicDreamer: Enhancing Multi-View Consistency in Text-to-3D Generation via Rectified Score Distillation.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

Same journal

BiCM-Prompt: Bidirectional Cross-Modal Prompt Tuning for Class-Incremental Learning on Multisource Remote Sensing Images.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

Same journal

GoP-based Quality Enhancement on Video Compression.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

Same journal

Align then Tensorize: Multi-Level Consistent Anchor Graph Learning for Scalable Multi-View Clustering.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

Same journal

Beyond Fidelity: Diverse Image Synthesis via Retrieval-Augmented Diffusion.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Jun 27, 2026

From Voxels to Knowledge: A Practical Guide to the Segmentation of Complex Electron Microscopy 3D-Data

From Voxels to Knowledge: A Practical Guide to the Segmentation of Complex Electron Microscopy 3D-Data

Published on: August 13, 2014

DiCLIP: Diffusion Model Enhances CLIP's Dense Knowledge for Weakly Supervised Semantic Segmentation.

Zhiwei Yang, Pengfei Song, Yucong Meng

IEEE Transactions on Image Processing : a Publication of the IEEE Signal Processing Society

|May 15, 2026

Summary

This summary is machine-generated.

DiCLIP enhances weakly supervised semantic segmentation (WSSS) by using diffusion models to improve Contrastive Language-Image Pre-training (CLIP) dense knowledge. This novel approach boosts performance and reduces training costs for pixel-level predictions.

More Related Videos

Application of Deep Learning-Based Medical Image Segmentation via Orbital Computed Tomography

Application of Deep Learning-Based Medical Image Segmentation via Orbital Computed Tomography

Published on: November 30, 2022

End-To-End Deep Neural Network for Salient Object Detection in Complex Environments

End-To-End Deep Neural Network for Salient Object Detection in Complex Environments

Published on: December 15, 2023

Related Experiment Videos

Last Updated: Jun 27, 2026

From Voxels to Knowledge: A Practical Guide to the Segmentation of Complex Electron Microscopy 3D-Data

From Voxels to Knowledge: A Practical Guide to the Segmentation of Complex Electron Microscopy 3D-Data

Published on: August 13, 2014

Application of Deep Learning-Based Medical Image Segmentation via Orbital Computed Tomography

Application of Deep Learning-Based Medical Image Segmentation via Orbital Computed Tomography

Published on: November 30, 2022

End-To-End Deep Neural Network for Salient Object Detection in Complex Environments

End-To-End Deep Neural Network for Salient Object Detection in Complex Environments

Published on: December 15, 2023

Area of Science:

Computer Vision
Artificial Intelligence
Machine Learning

Background:

Weakly Supervised Semantic Segmentation (WSSS) commonly uses Class Activation Maps (CAMs) for pixel-level predictions from image-level labels.
Contrastive Language-Image Pre-training (CLIP) has emerged for CAM generation in WSSS, but often suffers from limited dense knowledge in visual and text modalities, leading to suboptimal CAMs.

Purpose of the Study:

To introduce DiCLIP, a novel WSSS framework that enhances CLIP's dense knowledge using generative diffusion models.
To address the limitations of existing WSSS methods by improving spatial awareness and semantic representation.

Main Methods:

Proposed Visual Correlation Enhancement (VCE) module with Attention Clustering Refinement (ACR) to improve spatial awareness and mitigate over-smoothing in CLIP's attention.
Introduced Text Semantic Augmentation (TSA) module leveraging diffusion models for a dynamic key-value cache, shifting to a visual knowledge retrieval paradigm for richer text semantics.

Main Results:

DiCLIP significantly outperforms state-of-the-art methods on benchmark datasets like PASCAL VOC and MS COCO.
The proposed framework demonstrates a notable reduction in training costs compared to existing approaches.

Conclusions:

DiCLIP effectively leverages generative diffusion models to enhance CLIP for improved WSSS performance.
The framework offers a more efficient and effective solution for dense prediction tasks in computer vision.