Search research articles

关于 JoVE

概览领导团队博客 JoVE 帮助中心

作者

出版流程编辑委员会范围与政策同行评审常见问题投稿

图书馆员

用户评价订阅访问资源图书馆顾问委员会常见问题

研究

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments 存档

教育

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual 教师资源中心教师网站

使用条款与条件

相关概念视频

Multi-input and Multi-variable systems

Multi-input and Multi-variable systems

Cruise control systems in cars are designed as multi-input systems to maintain a driver's desired speed while compensating for external disturbances such as changes in terrain. The block diagram for a cruise control system typically includes two main inputs: the desired speed set by the driver and any external disturbances, such as the incline of the road. By adjusting the engine throttle, the system maintains the vehicle's speed as close to the desired value as possible.
In the absence...

Associative Learning

Associative Learning

Associative learning is a fundamental concept in behavioral psychology, wherein a connection is established between two stimuli or events, leading to a learned response. This process is critical in understanding how behaviors are acquired and modified. Conditioning, the mechanism through which associations are formed, can be divided into two main types: classical conditioning and operant conditioning, each elucidating different aspects of associative learning.
Classical conditioning, also known...

Labeling Emotion

Labeling Emotion

Emotional labeling is a cognitive process that involves identifying and naming one's emotions, such as anger, fear, happiness, or sadness. It allows individuals to recognize and express their internal emotional states, a critical aspect of emotional regulation and communication. Labeling emotions requires more than mere recognition; it also involves drawing upon memory and contextual cues to understand the current situation and apply a corresponding emotional label. For instance, feeling...

Force Classification

Force Classification

Forces play a crucial role in the study of physics and engineering. They are essential in describing the motion, behavior, and equilibrium of objects in the physical world. Forces can be classified based on their origin, type, and direction of action.
Contact and non-contact forces are two of the most widely used categories of forces. As the name suggests, contact forces require physical contact between two objects to act upon each other. Examples of contact forces include frictional,...

Improving Translational Accuracy

Improving Translational Accuracy

Base complementarity between the three base pairs of mRNA codon and the tRNA anticodon is not a failsafe mechanism. Inaccuracies can range from a single mismatch to no correct base pairing at all. The free energy difference between the correct and nearly correct base pairs can be as small as 3 kcal/ mol. With complementarity being the only proofreading step, the estimated error frequency would be one wrong amino acid in every 100 amino acids incorporated. However, error frequencies observed in...

Introduction to Learning

Introduction to Learning

Learning is the process of acquiring knowledge or skills through practice or experience, leading to long-lasting behavioral changes. This acquisition occurs through interaction with the environment and requires practice or experience. For instance, mastering a skill such as surfing requires considerable practice and experience, highlighting the essential role of repeated interactions with the environment in learning.
In contrast to learned behaviors, unlearned behaviors such as crying, sexual...

您也可能阅读

相关文章

通过共同作者、期刊和引用图与本文相关的文章。

排序

Same author

Spatio-Temporal Representation Decoupling and Enhancement for Federated Instrument Segmentation in Surgical Videos.

IEEE transactions on medical imaging·2026

Same author

Addressing Client Drift in Federated Learning via Class-Prototype Similarity Distillation and Adaptive Mask.

IEEE transactions on cybernetics·2025

Same author

From pretraining to privacy: federated ultrasound foundation model with self-supervised learning.

NPJ digital medicine·2025

Same author

Federated Pseudo Modality Generation for Incomplete Multi-Modal MRI Reconstruction.

IEEE journal of biomedical and health informatics·2025

Same author

Achieving flexible fairness metrics in federated medical imaging.

Nature communications·2025

Same author

Federated Cross-Incremental Self-Supervised Learning for Medical Image Segmentation.

IEEE transactions on neural networks and learning systems·2024

Same journal

HardFlow: Hard-Constrained Sampling for Flow-Matching Models Via Trajectory Optimization.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

Industrial Brain: Self-Evolving Neuro-Symbolic Autonomy with Causal Resilience for Cyber-Physical Systems.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

Adaptive Hardness-Driven Dictionary Distillation for Incomplete Streaming View Clustering.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

Mixture of Global and Local Experts with Diffusion Transformer for Controllable Face Generation.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

Task-KV: Task-aware KV Cache Optimization via Semantic Differentiation of Attention Heads.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

Achieving Text-based Person Retrieval with Any Granularity.

IEEE transactions on pattern analysis and machine intelligence·2026

查看所有相关文章

Search research articles

相关实验视频

Updated: Sep 20, 2025

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

Published on: May 7, 2019

文本到图像为多标签图像识别与联合提示 - 适配器学习.

Chun-Mei Feng, Kai Yu, Xinxing Xu

IEEE transactions on pattern analysis and machine intelligence

|May 26, 2025

概括

此摘要是机器生成的。

T2I-PAL通过从文本生成图像来减少视觉语言模型中的模式差距,从而提高了无需手动注释的多标签图像识别性能. 这种方法可以提高像CLIP这样的模型的参数效率微调 (PEFT).

更多相关视频

Objectification of Tongue Diagnosis in Traditional Medicine, Data Analysis, and Study Application

Objectification of Tongue Diagnosis in Traditional Medicine, Data Analysis, and Study Application

Published on: April 14, 2023

Author Spotlight: Addressing Technical and Subjective Challenges in Measuring Classroom Attention

Author Spotlight: Addressing Technical and Subjective Challenges in Measuring Classroom Attention

Published on: December 15, 2023

相关实验视频

Last Updated: Sep 20, 2025

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

Published on: May 7, 2019

Objectification of Tongue Diagnosis in Traditional Medicine, Data Analysis, and Study Application

Objectification of Tongue Diagnosis in Traditional Medicine, Data Analysis, and Study Application

Published on: April 14, 2023

Author Spotlight: Addressing Technical and Subjective Challenges in Measuring Classroom Attention

Author Spotlight: Addressing Technical and Subjective Challenges in Measuring Classroom Attention

Published on: December 15, 2023

科学领域:

计算机视觉计算机视觉
机器学习机器学习
人工智能的人工智能

背景情况:

像CLIP这样的视觉语言模型 (VLM) 利用图像-文本对比学习进行参数有效微调 (PEFT).
一个重大挑战是模式差距,在使用文本作为图像 (TaI) 时限制性能.
多标签图像识别 (MLR) 需要强大的特征表示来处理图像中的多个对象类.

研究的目的:

为了弥补MLR的VLM中的模式差距,只使用PEFT的文字标题.
引入T2I-PAL,这是一种利用文本到图像生成来弥合模式差距的新方法.
提高MLR性能,减少对训练数据进行大量手动注释的需求.

主要方法:

利用预先训练的文本到图像模型,从文本标题中生成多样化,现实的图像,减少文本-图像模式差距.
整合类智能的热图和可学习的原型,以汇总本地相似之处,以实现强大的视觉特征表示.
结合快速调整和适配器学习,以提高参数效率微调 (PEFT) 和分类精度.

主要成果:

T2I-PAL显著减少了文本和图像表示之间的模式差距.
该方法增强了MLR的局部视觉特征的稳定性和信息性.
对MS-COCO,VOC2007和NUS-WIDE基准的实验显示,与最先进的方法相比,平均性能提升了3.47%.

结论:

T2I-PAL有效地解决了多标签图像识别视觉语言模型中的模式差距.
这种方法消除了对完全语义注释的培训图像的需求,减少了手动注释工作量.
T2I-PAL保留了CLIP模型的内在模式,使其能够与现有的CLIP框架无集成,并提高识别性能.