Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

An Upper-Limb Motor Imagery EEG Dataset of Chronic Stroke Patients.

Scientific data·2026

Same author

Mutual Generation for Cross-domain Challenge in Stroke Patients' Motor Imagery Classification and Functional Recovery Prediction.

IEEE journal of biomedical and health informatics·2025

Same author

EGCN++: A New Fusion Strategy for Ensemble Learning in Skeleton-Based Rehabilitation Exercise Assessment.

IEEE transactions on pattern analysis and machine intelligence·2024

Same author

Human Eye Movements Reveal Video Frame Importance.

Computer·2021

Same author

Steganographer detection via a similarity accumulation graph convolutional network.

Neural networks : the official journal of the International Neural Network Society·2021

Same author

Determining dependency and redundancy for identifying gene-gene interaction associated with complex disease.

Journal of bioinformatics and computational biology·2020

Same journal

Relation DETR+: Exploring Explicit Position Relation Prior for Dense Prediction.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

RBF++: Quantifying and Optimizing Reasoning Boundaries across Measurable and Unmeasurable Capabilities for Chain-of-Thought Reasoning.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

CAFE: Cross-View Adaptive Fusion and Cluster Center Enhancement for Robust Multi-View Clustering.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

DIVER: Reinforced Diffusion Breaks Imitation Bottlenecks in End-to-End Autonomous Driving.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

Ethics-Aware Safe Reinforcement Learning for Rare-Event Risk Control in Interactive Urban Driving.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

Learning Shape Anchors for Holistic Indoor Scene Understanding.

IEEE transactions on pattern analysis and machine intelligence·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Sep 22, 2025

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

Published on: May 7, 2019

MMNet: A Model-Based Multimodal Network for Human Action Recognition in RGB-D Videos.

Bruce X B Yu, Yan Liu, Xiang Zhang

IEEE Transactions on Pattern Analysis and Machine Intelligence

|May 26, 2022

Summary

This summary is machine-generated.

This study introduces a novel multimodal network (MMNet) for human action recognition (HAR) using RGB-D videos. MMNet effectively fuses skeleton and RGB data, outperforming existing methods on multiple datasets.

More Related Videos

Author Spotlight: Addressing Technical and Subjective Challenges in Measuring Classroom Attention

Author Spotlight: Addressing Technical and Subjective Challenges in Measuring Classroom Attention

Published on: December 15, 2023

A Step-by-Step Implementation of DeepBehavior, Deep Learning Toolbox for Automated Behavior Analysis

A Step-by-Step Implementation of DeepBehavior, Deep Learning Toolbox for Automated Behavior Analysis

Published on: February 6, 2020

Related Experiment Videos

Last Updated: Sep 22, 2025

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

Published on: May 7, 2019

Author Spotlight: Addressing Technical and Subjective Challenges in Measuring Classroom Attention

Author Spotlight: Addressing Technical and Subjective Challenges in Measuring Classroom Attention

Published on: December 15, 2023

A Step-by-Step Implementation of DeepBehavior, Deep Learning Toolbox for Automated Behavior Analysis

A Step-by-Step Implementation of DeepBehavior, Deep Learning Toolbox for Automated Behavior Analysis

Published on: February 6, 2020

Area of Science:

Computer Vision
Artificial Intelligence
Machine Learning

Background:

Human Action Recognition (HAR) in RGB-D videos is a growing field.
Unimodal approaches (skeleton-based, RGB-based) have advanced significantly.
Multimodal methods, especially model-level fusion, remain underexplored.

Purpose of the Study:

To propose a model-based multimodal network (MMNet) for fusing skeleton and RGB data.
To enhance ensemble recognition accuracy by leveraging complementary information from different modalities.
To improve the discriminative power of features for HAR.

Main Methods:

Developed a model-based multimodal network (MMNet).
Employed a spatiotemporal graph convolution network for skeleton data to learn attention weights.
Transferred learned attention weights from skeleton to RGB modality network.
Utilized model-level fusion to combine skeleton and RGB information.

Main Results:

MMNet outperformed state-of-the-art approaches on six evaluation protocols across five benchmark datasets (NTU RGB+D 60/120, PKU-MMD, Northwestern-UCLA, Toyota Smarthome).
The method demonstrated consistent performance on the Kinetics 400 RGB video dataset, indicating robustness.
Achieved effective capture of complementary features between RGB and skeleton modalities.

Conclusions:

The proposed MMNet effectively fuses skeleton and RGB modalities for improved HAR.
MMNet provides more discriminative features by capturing complementary information.
The model-level fusion approach offers a promising direction for multimodal HAR research.