Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Observational Learning

Observational Learning

Albert Bandura's observational learning, also known as imitation or modeling, occurs when a person observes and imitates another's behavior. It is a quicker process than operant conditioning. A well-known example is the Bobo doll study, where children who saw an adult acting aggressively towards the doll were more likely to act aggressively when left alone, compared to those who observed a nonaggressive adult. Many psychologists view observational learning as a form of latent learning...

Impression Management Techniques III: Aligning Actions

Impression Management Techniques III: Aligning Actions

Aligning actions are communicative strategies individuals employ to maintain social harmony and preserve personal identity in the face of potential disruptions to social norms. These actions are particularly important in managing social impressions when one's behavior might be seen as inappropriate, incompetent, or morally questionable.Types of Aligning ActionsThe three principal types of aligning actions are disclaimers, accounts, and apologies.DisclaimersDisclaimers are preventive; they are...

Associative Learning

Associative Learning

Associative learning is a fundamental concept in behavioral psychology, wherein a connection is established between two stimuli or events, leading to a learned response. This process is critical in understanding how behaviors are acquired and modified. Conditioning, the mechanism through which associations are formed, can be divided into two main types: classical conditioning and operant conditioning, each elucidating different aspects of associative learning.
Classical conditioning, also known...

Cognitive Learning

Cognitive Learning

Cognitive learning is based on purposive behavior, incidental learning, and insight learning.
E. C. Tolman's theory of purposive behavior emphasizes that much behavior is goal-directed. He argued that to understand behavior, we must look at the entire sequence of actions leading to a goal. For instance, high school students study hard, not just due to past reinforcement but also to achieve the goal of getting into a good college.
Tolman introduced the idea that behavior is influenced by...

Vision

Vision

Vision is the result of light being detected and transduced into neural signals by the retina of the eye. This information is then further analyzed and interpreted by the brain. First, light enters the front of the eye and is focused by the cornea and lens onto the retina—a thin sheet of neural tissue lining the back of the eye. Because of refraction through the convex lens of the eye, images are projected onto the retina upside-down and reversed.

Purposive Learning

Purposive Learning

E. C. Tolman emphasized the purposiveness of behavior — the idea that much of our behavior is goal-directed. For instance, employees who aim for a promotion work diligently to meet their targets. Tolman argued that when classical conditioning and operant conditioning occur, the organism acquires certain expectations. In classical conditioning, a child might fear a dog because they expect it to bite. In operant conditioning, a person might consistently work overtime because they expect a...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Hierarchical Consistency Learning for Test-time Adaptation in Camouflage Perception.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

Same author

Knowledge Diffusion-Based Adaptive Alignment with Hierarchical Context for Video Temporal Grounding.

IEEE transactions on pattern analysis and machine intelligence·2026

Same author

OmniCharacter++: Towards Comprehensive Benchmark for Realistic Role-Playing Agents.

IEEE transactions on pattern analysis and machine intelligence·2026

Same author

Scalable and Efficient Deep Reinforcement Learning-Based Model Checker for Computation Tree Logic.

IEEE transactions on neural networks and learning systems·2026

Same author

From Channel Bias to Feature Redundancy: Uncovering the "Less Is More" Principle in Few-Shot Learning.

IEEE transactions on pattern analysis and machine intelligence·2026

Same author

SeMv-3D: Toward Concurrency of Semantic and Multi-View Consistency in General Text-to-3D Generation.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

Same journal

CLASH-CTTA: Class-Wise Shift-Aware Hierarchical Continual Test-Time Adaptation.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

Same journal

Voxel-based Point Cloud Geometry Compression with Space-to-Channel Context.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

Same journal

RIGI: Rectifying Image-to-3D Generation Inconsistency via Uncertainty-aware Learning.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

Same journal

DA-Cal: Towards Cross-Domain Calibration in Semantic Segmentation.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

Same journal

Multi-Dimensional Quality Assessment for Single-Image-to-3D Contents: Dataset and Model.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

Same journal

Enhancing Underwater Light Field Images via Global Geometry-aware Diffusion Process.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

See all related articles

Search research articles

Home
Vision-language Collaborative Representation Learning For Action Quality Assessment.

Home
Vision-language Collaborative Representation Learning For Action Quality Assessment.

Related Experiment Video

Simulation of a Scaled Assembly Process with Collaboration of a Robotic Arm and Monitoring through a Vision System for Quality Control

Simulation of a Scaled Assembly Process with Collaboration of a Robotic Arm and Monitoring through a Vision System for Quality Control

Published on: August 29, 2025

Vision-Language Collaborative Representation Learning for Action Quality Assessment.

Kumie Gedamu, Yanli Ji, Wangmeng Zuo

IEEE Transactions on Image Processing : a Publication of the IEEE Signal Processing Society

|April 17, 2026

View abstract on PubMed

Summary

This summary is machine-generated.

This study introduces Vision-Language Collaboration Representation Learning (VLC-Net) for improved Action Quality Assessment (AQA). VLC-Net enhances fine-grained action understanding and prediction accuracy by unifying vision and language features.

More Related Videos

Photorealistic Learned Landscapes for Augmented Reality

Photorealistic Learned Landscapes for Augmented Reality

Published on: June 27, 2025

Related Experiment Videos

Simulation of a Scaled Assembly Process with Collaboration of a Robotic Arm and Monitoring through a Vision System for Quality Control

Simulation of a Scaled Assembly Process with Collaboration of a Robotic Arm and Monitoring through a Vision System for Quality Control

Published on: August 29, 2025

Photorealistic Learned Landscapes for Augmented Reality

Photorealistic Learned Landscapes for Augmented Reality

Published on: June 27, 2025

Area of Science:

Computer Vision and Machine Learning
Multimodal AI
Action Recognition and Understanding

Background:

Action Quality Assessment (AQA) is crucial for real-world applications requiring detailed action sequence comprehension.
Existing multimodal approaches for AQA often suffer from instability and suboptimal performance due to directional bias in vision-language embedding spaces.
Reliance solely on textual information from language models limits the effectiveness of current AQA methods.

Purpose of the Study:

To propose a novel Vision-Language Collaboration Representation Learning approach (VLC-Net) for accurate AQA score prediction.
To develop a unified feature representation that captures temporal dependencies in fine-grained action sequences.
To overcome the limitations of existing methods by addressing directional bias and improving multimodal feature integration.

Main Methods:

Implemented a bidirectional knowledge distillation operation for collaborative learning between pre-trained vision-language models and visual action knowledge.
Designed vision-language alignment guidance to explicitly align action features with shared semantic meanings across modalities.
Utilized multimodal contrastive learning on aligned features to enhance the relationship between modalities and subactions with textual descriptions.

Main Results:

VLC-Net demonstrated superior performance in fine-grained action sequence understanding and AQA score prediction.
The proposed methods effectively unified joint representations by aligning features across vision and language modalities.
Experimental results on multiple datasets (FineDiving, MTL-AQA, FineFS, Fis-V) show significant improvements over state-of-the-art methods.

Conclusions:

VLC-Net effectively addresses the challenges of directional bias in vision-language embedding spaces for AQA.
The approach successfully learns unified representations of fine-grained actions by integrating visual and textual information.
The proposed method offers a robust and effective solution for accurate Action Quality Assessment, outperforming existing techniques.