Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Multi-input and Multi-variable systems01:22

Multi-input and Multi-variable systems

157
Cruise control systems in cars are designed as multi-input systems to maintain a driver's desired speed while compensating for external disturbances such as changes in terrain. The block diagram for a cruise control system typically includes two main inputs: the desired speed set by the driver and any external disturbances, such as the incline of the road. By adjusting the engine throttle, the system maintains the vehicle's speed as close to the desired value as possible.
In the absence...
157
Associative Learning01:27

Associative Learning

608
Associative learning is a fundamental concept in behavioral psychology, wherein a connection is established between two stimuli or events, leading to a learned response. This process is critical in understanding how behaviors are acquired and modified. Conditioning, the mechanism through which associations are formed, can be divided into two main types: classical conditioning and operant conditioning, each elucidating different aspects of associative learning.
Classical conditioning, also known...
608
Labeling Emotion01:20

Labeling Emotion

254
Emotional labeling is a cognitive process that involves identifying and naming one's emotions, such as anger, fear, happiness, or sadness. It allows individuals to recognize and express their internal emotional states, a critical aspect of emotional regulation and communication. Labeling emotions requires more than mere recognition; it also involves drawing upon memory and contextual cues to understand the current situation and apply a corresponding emotional label. For instance, feeling...
254
Force Classification01:22

Force Classification

1.7K
Forces play a crucial role in the study of physics and engineering. They are essential in describing the motion, behavior, and equilibrium of objects in the physical world. Forces can be classified based on their origin, type, and direction of action.
Contact and non-contact forces are two of the most widely used categories of forces. As the name suggests, contact forces require physical contact between two objects to act upon each other. Examples of contact forces include frictional,...
1.7K
Improving Translational Accuracy02:07

Improving Translational Accuracy

11.9K
Base complementarity between the three base pairs of mRNA codon and the tRNA anticodon is not a failsafe mechanism. Inaccuracies can range from a single mismatch to no correct base pairing at all. The free energy difference between the correct and nearly correct base pairs can be as small as 3 kcal/ mol. With complementarity being the only proofreading step, the estimated error frequency would be one wrong amino acid in every 100 amino acids incorporated. However, error frequencies observed in...
11.9K
Introduction to Learning01:18

Introduction to Learning

552
Learning is the process of acquiring knowledge or skills through practice or experience, leading to long-lasting behavioral changes. This acquisition occurs through interaction with the environment and requires practice or experience. For instance, mastering a skill such as surfing requires considerable practice and experience, highlighting the essential role of repeated interactions with the environment in learning.
In contrast to learned behaviors, unlearned behaviors such as crying, sexual...
552

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Spatio-Temporal Representation Decoupling and Enhancement for Federated Instrument Segmentation in Surgical Videos.

IEEE transactions on medical imaging·2026
Same author

Addressing Client Drift in Federated Learning via Class-Prototype Similarity Distillation and Adaptive Mask.

IEEE transactions on cybernetics·2025
Same author

From pretraining to privacy: federated ultrasound foundation model with self-supervised learning.

NPJ digital medicine·2025
Same author

Federated Pseudo Modality Generation for Incomplete Multi-Modal MRI Reconstruction.

IEEE journal of biomedical and health informatics·2025
Same author

Achieving flexible fairness metrics in federated medical imaging.

Nature communications·2025
Same author

Federated Cross-Incremental Self-Supervised Learning for Medical Image Segmentation.

IEEE transactions on neural networks and learning systems·2024
Same journal

HardFlow: Hard-Constrained Sampling for Flow-Matching Models Via Trajectory Optimization.

IEEE transactions on pattern analysis and machine intelligence·2026
Same journal

Industrial Brain: Self-Evolving Neuro-Symbolic Autonomy with Causal Resilience for Cyber-Physical Systems.

IEEE transactions on pattern analysis and machine intelligence·2026
Same journal

Adaptive Hardness-Driven Dictionary Distillation for Incomplete Streaming View Clustering.

IEEE transactions on pattern analysis and machine intelligence·2026
Same journal

Mixture of Global and Local Experts with Diffusion Transformer for Controllable Face Generation.

IEEE transactions on pattern analysis and machine intelligence·2026
Same journal

Task-KV: Task-aware KV Cache Optimization via Semantic Differentiation of Attention Heads.

IEEE transactions on pattern analysis and machine intelligence·2026
Same journal

Achieving Text-based Person Retrieval with Any Granularity.

IEEE transactions on pattern analysis and machine intelligence·2026
See all related articles

Related Experiment Video

Updated: Sep 20, 2025

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment
08:25

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

Published on: May 7, 2019

9.1K

Text to Image for Multi-Label Image Recognition With Joint Prompt-Adapter Learning.

Chun-Mei Feng, Kai Yu, Xinxing Xu

    IEEE Transactions on Pattern Analysis and Machine Intelligence
    |May 26, 2025
    PubMed
    Summary
    This summary is machine-generated.

    T2I-PAL reduces the modality gap in vision-language models by generating images from text, improving multi-label image recognition performance without manual annotation. This method enhances parameter-efficient fine-tuning (PEFT) for models like CLIP.

    More Related Videos

    Objectification of Tongue Diagnosis in Traditional Medicine, Data Analysis, and Study Application
    05:56

    Objectification of Tongue Diagnosis in Traditional Medicine, Data Analysis, and Study Application

    Published on: April 14, 2023

    2.7K
    Author Spotlight: Addressing Technical and Subjective Challenges in Measuring Classroom Attention
    06:37

    Author Spotlight: Addressing Technical and Subjective Challenges in Measuring Classroom Attention

    Published on: December 15, 2023

    4.2K

    Related Experiment Videos

    Last Updated: Sep 20, 2025

    Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment
    08:25

    Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

    Published on: May 7, 2019

    9.1K
    Objectification of Tongue Diagnosis in Traditional Medicine, Data Analysis, and Study Application
    05:56

    Objectification of Tongue Diagnosis in Traditional Medicine, Data Analysis, and Study Application

    Published on: April 14, 2023

    2.7K
    Author Spotlight: Addressing Technical and Subjective Challenges in Measuring Classroom Attention
    06:37

    Author Spotlight: Addressing Technical and Subjective Challenges in Measuring Classroom Attention

    Published on: December 15, 2023

    4.2K

    Area of Science:

    • Computer Vision
    • Machine Learning
    • Artificial Intelligence

    Background:

    • Vision-language models (VLMs) like CLIP leverage image-text contrastive learning for parameter-efficient fine-tuning (PEFT).
    • A significant challenge is the modality gap, limiting performance when using text as images (TaI).
    • Multi-label image recognition (MLR) requires robust feature representation to handle multiple object classes within an image.

    Purpose of the Study:

    • To address the modality gap in VLMs for MLR using only text captions for PEFT.
    • To introduce T2I-PAL, a novel method that utilizes text-to-image generation to bridge the modality gap.
    • To enhance MLR performance and reduce the need for extensive manual annotation of training data.

    Main Methods:

    • Leveraging pre-trained text-to-image models to generate diverse, realistic images from text captions, reducing the text-image modality gap.
    • Incorporating a class-wise heatmap and learnable prototypes to aggregate local similarities for robust visual feature representation.
    • Combining prompt tuning and adapter learning for improved parameter-efficient fine-tuning (PEFT) and classification accuracy.

    Main Results:

    • T2I-PAL significantly reduces the modality gap between text and image representations.
    • The method enhances the robustness and informativeness of local visual features for MLR.
    • Experiments on MS-COCO, VOC2007, and NUS-WIDE benchmarks show an average performance boost of 3.47% over state-of-the-art methods.

    Conclusions:

    • T2I-PAL effectively tackles the modality gap in vision-language models for multi-label image recognition.
    • The approach eliminates the need for fully semantically annotated training images, reducing manual annotation workload.
    • T2I-PAL preserves the CLIP model's intrinsic mode, enabling seamless integration with existing CLIP frameworks and improving recognition performance.