Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Classification of Bones01:18

Classification of Bones

13.9K
The bones of the human skeletal system are of varied shapes, sizes, and functions. They can be classified based on their shape and function into four major classes: long bones, short bones, flat bones, and irregular bones. Some classifications include a fifth type, the sesamoid bones, as a separate class, whereas others categorize them under short bones.
Long and Short Bones
The appendicular skeleton, particularly the upper and lower limbs, is primarily made of long and short bones. The...
13.9K
Functional Classification of Joints01:09

Functional Classification of Joints

9.1K
Functional Classification of Joints
The functional classification of joints is determined by the amount of mobility between the adjacent bones. Joints are functionally classified as a synarthrosis or immobile joint, an amphiarthrosis or slightly moveable joint, or as a diarthrosis, a freely moveable joint. Fibrous and cartilaginous joints can be functionally classified as either synarthroses  or amphiarthroses, whereas all synovial joints are classified as diarthroses.
Synarthrosis
An...
9.1K
Structural Classification of Joints01:20

Structural Classification of Joints

8.8K
Joints, also known as articulations, are classified based on their structural characteristics, i.e., based on whether the articulating surfaces of the adjacent bones are directly connected by fibrous connective tissue or cartilage, or whether the articulating surfaces contact each other within a fluid-filled joint cavity. These differences serve to divide the joints of the body into three structural classifications.
A fibrous joint is where the adjacent bones are united by fibrous connective...
8.8K
Muscle Coordination and Action01:24

Muscle Coordination and Action

3.7K
Muscle coordination is a complex and finely tuned process essential for smooth and purposeful movements like flexion, extension, adduction, abduction, and rotation. The human body orchestrates the actions of various muscles working in concert, each with a specific role. Four functional types describe how muscles work together: agonist, antagonist, synergist, and fixator.
Agonists
Agonist muscles, often called prime movers, are the primary muscles responsible for producing a specific movement....
3.7K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Unveiling infrastructure-induced vertical environmental inequity near elevated roads via drone-based measurements.

Journal of hazardous materials·2026
Same author

Prussian Blue Nanozyme Disrupts the Self-Reinforcing Loop of Tauopathy via Triple-Action Mechanism.

Advanced healthcare materials·2026
Same author

Cathepsin B mediates HDAC inhibitor-induced epithelial-mesenchymal transition in lung cancer cells.

European journal of pharmacology·2026
Same author

Probiotic supplementation on cognitive and other aging-related physiological functions in middle-aged and older adults with mild cognitive impairment (PCAMCI): protocol for a randomized, triple-blinded, placebo-controlled trial.

Nutrition journal·2025
Same author

Molecular crossbreeding-engineered self-calibrating probe with large emission shift for dual near-infrared imaging of therapy-induced senescence.

Biosensors & bioelectronics·2025
Same author

NAD+-Boosters Improve Mitochondria Quality Control In Parkinson's Disease Models Via Mitochondrial UPR.

Advanced science (Weinheim, Baden-Wurttemberg, Germany)·2025
Same journal

RETRACTED: Zhang et al. A Novel Framework for Reconstruction and Imaging of Target Scattering Centers via Wide-Angle Incidence in Radar Networks. <i>Sensors</i> 2025, <i>25</i>, 6802.

Sensors (Basel, Switzerland)·2026
Same journal

Enhancing Unsupervised Multi-Source Domain Adaptation for Person Re-Identification via Mixture of Experts and Graph-Based Relation.

Sensors (Basel, Switzerland)·2026
Same journal

Development of an Instrumented Glove for Palmar Pressure Assessment in Kayakers.

Sensors (Basel, Switzerland)·2026
Same journal

Development and Experimental Validation of an Autonomous IoT-Based Monitoring System for Real-Time Water Quality Assessment in the Amazon River.

Sensors (Basel, Switzerland)·2026
Same journal

Semi-Supervised Adversarial Learning Framework for Controller Area Network Bus Intrusion Detection.

Sensors (Basel, Switzerland)·2026
Same journal

Smart Optimization Method for Safety Signs in Innovative Manufacturing Environments Integrating Industrial Field IoT Sensors and Knowledge Graphs.

Sensors (Basel, Switzerland)·2026
See all related articles

Related Experiment Video

Updated: Apr 6, 2026

Estimation of Contact Regions Between Hands and Objects During Human Multi-Digit Grasping
09:41

Estimation of Contact Regions Between Hands and Objects During Human Multi-Digit Grasping

Published on: April 21, 2023

1.5K

Linguistic-Driven Partial Semantic Relevance Learning for Skeleton-Based Action Recognition.

Qixiu Chen1, Yingan Liu1, Peng Huang2

  • 1College of Information Science and Technology, Nanjing Forestry University, Nanjing 210037, China.

Sensors (Basel, Switzerland)
|August 10, 2024
PubMed
Summary
This summary is machine-generated.

This study introduces a new framework for skeleton-based action recognition that uses language descriptions to improve motion analysis. The Linguistic-Driven Partial Semantic Relevance Learning framework captures subtle action differences for more accurate human behavior representation.

Keywords:
cross-modalskeleton-based action recognitiontransformer

Frequently Asked Questions

More Related Videos

Author Spotlight: Addressing Technical and Subjective Challenges in Measuring Classroom Attention
06:37

Author Spotlight: Addressing Technical and Subjective Challenges in Measuring Classroom Attention

Published on: December 15, 2023

2.7K
Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception
05:48

Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception

Published on: August 9, 2024

1.5K

Related Experiment Videos

Last Updated: Apr 6, 2026

Estimation of Contact Regions Between Hands and Objects During Human Multi-Digit Grasping
09:41

Estimation of Contact Regions Between Hands and Objects During Human Multi-Digit Grasping

Published on: April 21, 2023

1.5K
Author Spotlight: Addressing Technical and Subjective Challenges in Measuring Classroom Attention
06:37

Author Spotlight: Addressing Technical and Subjective Challenges in Measuring Classroom Attention

Published on: December 15, 2023

2.7K
Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception
05:48

Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception

Published on: August 9, 2024

1.5K

Area of Science:

  • Computer vision and motion analysis.
  • The intersection of natural language processing and skeleton-based action recognition.
  • Machine learning frameworks for human-computer interaction.

Background:

Motion analysis frequently utilizes skeletal data due to its robustness against environmental lighting changes and its inherent computational efficiency compared to dense video processing techniques. Prior research has shown that most existing frameworks prioritize the extraction of global skeletal features to classify human movements, which often results in the loss of localized semantic detail. These conventional approaches often fail to distinguish between actions that share similar global trajectories but differ significantly in fine-grained limb positioning or specific joint interactions. For example, the distinction between 'brush teeth' and 'brush hair' relies on subtle spatial relationships and limb-specific orientations that global descriptors might overlook during the automated feature extraction process. Relying exclusively on raw coordinate points proves insufficient for capturing the complex semantic nuances inherent in diverse human behaviors, as these numerical points lack inherent descriptive meaning. The field currently lacks a mechanism to bridge the gap between low-level joint data and high-level linguistic concepts for localized movements, limiting the depth of behavioral understanding. This absence of evidence motivated the exploration of cross-modal learning strategies to enhance the discriminative power of skeletal representations through the integration of natural language processing.

Purpose Of The Study:

The Linguistic-Driven Partial Semantic Relevance Learning (LPSR) framework integrates detailed linguistic descriptions into the skeletal feature learning process to capture highly discriminative behavior representations for advanced motion analysis. Researchers sought to address the limitations of global feature extraction by focusing on the semantic relationships among various partial limb motions that define specific human activities. The study leverages the descriptive power of large language models to provide a more holistic and semantically rich representation of human actions than previously possible with coordinate data. By incorporating fine-grained language, the architecture attempts to resolve ambiguities between actions that appear similar at a global scale but possess distinct local characteristics. The project focuses on modeling the implicit correlations between different body parts to improve classification accuracy and robustness in complex motion analysis scenarios where joint occlusion might occur. The investigation targets the development of a generalized cross-modal behavioral representation that combines textual and skeletal modalities into a single, cohesive learning objective for neural network training. This approach seeks to establish a new standard for how skeletal data is interpreted by aligning it with the way humans naturally describe motion through descriptive language.

Main Methods:

The team developed the Linguistic-Driven Partial Semantic Relevance Learning (LPSR) framework to facilitate multi-modal data fusion between skeletal coordinates and natural language generated by artificial intelligence. State-of-the-art Large Language Models (LLMs) were employed to generate specific linguistic descriptions of local limb motions, providing a semantic anchor for the raw skeletal data points. These textual descriptions served as constraints during the learning phase to refine the representation of local skeletal movements and ensure they align with human-understandable concepts of motion. The architecture aggregates global skeleton point representations with the generated textual data to create a unified feature space that benefits from both geometric precision and semantic information. A cyclic attentional interaction module was implemented to model the complex, implicit correlations between disparate partial limb motions across the entire human body during various action sequences. The researchers conducted numerous ablation experiments to evaluate the contribution of each component within the LPSR system, ensuring that every module added measurable value to the final recognition accuracy. The methodology involved comparing the performance of this new framework against existing state-of-the-art models in action recognition benchmarks to validate its superior accuracy and computational efficiency.

Main Results:

The Linguistic-Driven Partial Semantic Relevance Learning framework achieved state-of-the-art results across standard action recognition datasets, outperforming traditional models that rely solely on skeletal coordinates for motion classification. Experimental data confirmed that integrating fine-grained linguistic descriptions significantly improves the discriminative capacity of skeletal features by providing context that raw numerical data lacks during the training process. The cyclic attentional interaction module successfully captured the subtle dependencies between limb movements that global methods typically ignore, leading to more precise action classification in complex scenarios. Ablation studies demonstrated that the combination of textual and skeletal modalities outperforms single-modality approaches, proving the efficacy of the cross-modal learning strategy for motion analysis. The system effectively distinguished between semantically similar actions, such as 'brush teeth' and 'brush hair,' by utilizing local limb constraints generated by the large language model during inference. The results indicated that the LPSR framework provides a more generalized representation of human behavior than previous global-only models, making it more robust to variations in individual movement styles. These findings establish the LPSR framework as a leading approach for motion analysis tasks that require high levels of semantic precision and detail in diverse applications.

Conclusions:

The integration of linguistic semantics into skeletal motion analysis represents a significant advancement for the field of action recognition and human-computer interaction in the modern era. These findings suggest that cross-modal learning can overcome the inherent limitations of raw coordinate-based skeletal data by providing a semantic bridge to human language and conceptual understanding. The LPSR framework offers a scalable solution for improving the accuracy of motion analysis in diverse environmental conditions where lighting and background noise might interfere with traditional video systems. Future research may apply these linguistic-driven techniques to other areas of human-computer interaction, such as robotic perception, automated surveillance systems, and physical therapy monitoring. The study underscores the importance of modeling partial limb motions to achieve a comprehensive understanding of complex human activities that share global similarities but differ in detail. The researchers conclude that leveraging large language models for local motion description is a viable strategy for enhancing behavioral representations in machine learning models for various industries. This work paves the way for more intuitive and semantically aware systems that can interpret human actions with the same nuance and context as a human observer.

The LPSR framework utilizes linguistic descriptions to constrain the learning of local limb motions, allowing the system to identify subtle differences in joint positioning. This approach enables the model to differentiate between actions like 'brush teeth' and 'brush hair' that share nearly identical global skeletal trajectories.

The researchers used state-of-the-art Large Language Models to generate fine-grained linguistic descriptions of specific limb movements. These textual representations are then aggregated with global skeleton point data to create a generalized cross-modal representation that enhances the discriminative power of the action recognition system.

The cyclic attentional interaction module was designed to model the implicit correlations between various partial limb motions across the skeletal structure. By capturing these dependencies, the module allows the LPSR framework to integrate localized movement data into a more holistic and accurate representation of human behavior.

The study's authors indicate that global skeleton features often overlook the potential semantic relationships among various partial limb motions. This limitation makes it difficult for traditional models to capture the nuances of complex actions that are primarily distinguished by specific, localized joint movements rather than overall body displacement.

The study's authors propose that integrating detailed linguistic descriptions into the learning process is essential for capturing more discriminative skeleton behavior representations. They conclude that this cross-modal approach provides a more generalized and effective framework for motion analysis than methods relying on skeletal points alone.