Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Absolute Motion Analysis- General Plane Motion

Absolute Motion Analysis- General Plane Motion

Visualize a drone, with its propellers spinning rapidly, hovering mid-air. The fascinating movements and operations of this drone can be comprehended by applying the principle of general plane motion.
As the drone's propellers rotate, an upward force is generated that counteracts the force of gravity, enabling the drone to lift off from the ground. This initial movement of the drone is along a straight path, representing a form of translational motion. In this phase, every point on the...

Fixed Action Patterns

Fixed Action Patterns

A fixed action pattern (FAP) is a specific, hard-wired sequence of behaviors that occurs in response to an external stimulus, called a sign stimulus. The behavior is “fixed” because it is essentially unchangeable—proceeding similarly across individuals of a species every time it occurs.

Muscle Coordination and Action

Muscle Coordination and Action

Muscle coordination is a complex and finely tuned process essential for smooth and purposeful movements like flexion, extension, adduction, abduction, and rotation. The human body orchestrates the actions of various muscles working in concert, each with a specific role. Four functional types describe how muscles work together: agonist, antagonist, synergist, and fixator.
Agonists
Agonist muscles, often called prime movers, are the primary muscles responsible for producing a specific movement....

Relative Motion Analysis using Rotating Axes-Problem Solving

Relative Motion Analysis using Rotating Axes-Problem Solving

Consider a crane whose telescopic boom rotates with an angular velocity of 0.04 rad/s and angular acceleration of 0.02 rad/s2. Along with the rotation, the boom also extends linearly with a uniform speed of 5 m/s. The extension of the boom is measured at point D, which is measured with respect to the fixed point C on the other end of the boom. For the given instant, the distance between points C and D is 60 meters.
Here, in order to determine the magnitude of velocity and acceleration for point...

Planar Rigid-Body Motion

Planar Rigid-Body Motion

Understanding the movement of a rigid body in planar motion involves recognizing that every particle within this body is traversing a path that maintains a consistent distance from a specific plane. This concept is fundamental in the study of physics and mechanical engineering, and it allows us to comprehend better how objects move in space.
Planar motion is typically divided into three distinct categories. The first is rectilinear translation, demonstrated by a subway train that moves along...

Kinematic Equations: Problem Solving

Kinematic Equations: Problem Solving

When analyzing one-dimensional motion with constant acceleration, the problem-solving strategy involves identifying the known quantities and choosing the appropriate kinematic equations to solve for the unknowns. Either one or two kinematic equations are needed to solve for the unknowns, depending on the known and unknown quantities. Generally, the number of equations required is the same as the number of unknown quantities in the given example. Two-body pursuit problems always require two...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Development of ultra-high efficiency soft x-ray angle-resolved photoemission spectroscopy equipped with deep prior-based denoising method.

The Review of scientific instruments·2026

Same author

Editorial for "A Lightweight Convolutional Neural Network Based on Dynamic Level-Set Loss Function for Spine MR Image Segmentation".

Journal of magnetic resonance imaging : JMRI·2023

Same author

Development of spectral decomposition based on Bayesian information criterion with estimation of confidence interval.

Science and technology of advanced materials·2020

Same journal

RETRACTED: Zhang et al. A Novel Framework for Reconstruction and Imaging of Target Scattering Centers via Wide-Angle Incidence in Radar Networks. <i>Sensors</i> 2025, <i>25</i>, 6802.

Sensors (Basel, Switzerland)·2026

Same journal

Enhancing Unsupervised Multi-Source Domain Adaptation for Person Re-Identification via Mixture of Experts and Graph-Based Relation.

Sensors (Basel, Switzerland)·2026

Same journal

Development of an Instrumented Glove for Palmar Pressure Assessment in Kayakers.

Sensors (Basel, Switzerland)·2026

Same journal

Development and Experimental Validation of an Autonomous IoT-Based Monitoring System for Real-Time Water Quality Assessment in the Amazon River.

Sensors (Basel, Switzerland)·2026

Same journal

Semi-Supervised Adversarial Learning Framework for Controller Area Network Bus Intrusion Detection.

Sensors (Basel, Switzerland)·2026

Same journal

Smart Optimization Method for Safety Signs in Innovative Manufacturing Environments Integrating Industrial Field IoT Sensors and Knowledge Graphs.

Sensors (Basel, Switzerland)·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Sep 9, 2025

Author Spotlight: Addressing Technical and Subjective Challenges in Measuring Classroom Attention

Author Spotlight: Addressing Technical and Subjective Challenges in Measuring Classroom Attention

Published on: December 15, 2023

LLaVA-Pose: Keypoint-Integrated Instruction Tuning for Human Pose and Action Understanding.

Dewen Zhang¹, Tahir Hussain¹, Wangpeng An²

¹Department of Informatics, Graduate School of Informatics and Engineering, The University of Electro-Communications, Tokyo 182-8585, Japan.

Sensors (Basel, Switzerland)

|August 28, 2025

Summary

This summary is machine-generated.

This study introduces keypoint-integrated data to improve vision-language models (VLMs) for understanding human poses and actions. Fine-tuning with this specialized dataset significantly enhances VLM performance on human-centric tasks.

Keywords:

human pose and action understanding instruction-following data keypoint-integrated data generation multimodal instruction tuning vision–language models

More Related Videos

Estimation of Contact Regions Between Hands and Objects During Human Multi-Digit Grasping

Estimation of Contact Regions Between Hands and Objects During Human Multi-Digit Grasping

Published on: April 21, 2023

Quantifying Learning in Young Infants: Tracking Leg Actions During a Discovery-learning Task

Quantifying Learning in Young Infants: Tracking Leg Actions During a Discovery-learning Task

Published on: June 1, 2015

Related Experiment Videos

Last Updated: Sep 9, 2025

Author Spotlight: Addressing Technical and Subjective Challenges in Measuring Classroom Attention

Author Spotlight: Addressing Technical and Subjective Challenges in Measuring Classroom Attention

Published on: December 15, 2023

Estimation of Contact Regions Between Hands and Objects During Human Multi-Digit Grasping

Estimation of Contact Regions Between Hands and Objects During Human Multi-Digit Grasping

Published on: April 21, 2023

Quantifying Learning in Young Infants: Tracking Leg Actions During a Discovery-learning Task

Quantifying Learning in Young Infants: Tracking Leg Actions During a Discovery-learning Task

Published on: June 1, 2015

Area of Science:

Computer Vision
Artificial Intelligence
Multimodal Learning

Background:

Current vision-language models (VLMs) excel at general visual tasks but struggle with complex human pose and action recognition.
This limitation stems from a lack of specialized instruction-following data for human-centric visual understanding.

Purpose of the Study:

To develop a method for generating specialized vision-language data integrating human keypoints with traditional visual features.
To create a comprehensive dataset for fine-tuning VLMs on human-centric tasks, including conversation, detailed description, and complex reasoning.
To establish a benchmark for evaluating model performance in human pose and action understanding.

Main Methods:

Integrated human keypoint data with existing visual features like captions and bounding boxes.
Constructed a dataset of 200,328 samples focused on human-centric tasks.
Established the Extended Human Pose and Action Understanding Benchmark (E-HPAUB).
Fine-tuned the LLaVA-1.5-7B model using the generated dataset to create the LLaVA-Pose model.

Main Results:

The LLaVA-Pose model demonstrated significant improvements on the E-HPAUB benchmark.
Achieved an overall performance increase of 33.2% compared to the baseline LLaVA-1.5-7B model.
Validated the effectiveness of keypoint-integrated data for enhancing human-centric visual understanding.

Conclusions:

Keypoint-integrated data is crucial for advancing VLMs in understanding complex human poses and actions.
The proposed method and dataset effectively improve multimodal model capabilities for human-centric visual tasks.