Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

An Upper-Limb Motor Imagery EEG Dataset of Chronic Stroke Patients.

Scientific data·2026
Same author

Mutual Generation for Cross-domain Challenge in Stroke Patients' Motor Imagery Classification and Functional Recovery Prediction.

IEEE journal of biomedical and health informatics·2025
Same author

EGCN++: A New Fusion Strategy for Ensemble Learning in Skeleton-Based Rehabilitation Exercise Assessment.

IEEE transactions on pattern analysis and machine intelligence·2024
Same author

Human Eye Movements Reveal Video Frame Importance.

Computer·2021
Same author

Steganographer detection via a similarity accumulation graph convolutional network.

Neural networks : the official journal of the International Neural Network Society·2021
Same author

Determining dependency and redundancy for identifying gene-gene interaction associated with complex disease.

Journal of bioinformatics and computational biology·2020
Same journal

Relation DETR+: Exploring Explicit Position Relation Prior for Dense Prediction.

IEEE transactions on pattern analysis and machine intelligence·2026
Same journal

RBF++: Quantifying and Optimizing Reasoning Boundaries across Measurable and Unmeasurable Capabilities for Chain-of-Thought Reasoning.

IEEE transactions on pattern analysis and machine intelligence·2026
Same journal

CAFE: Cross-View Adaptive Fusion and Cluster Center Enhancement for Robust Multi-View Clustering.

IEEE transactions on pattern analysis and machine intelligence·2026
Same journal

DIVER: Reinforced Diffusion Breaks Imitation Bottlenecks in End-to-End Autonomous Driving.

IEEE transactions on pattern analysis and machine intelligence·2026
Same journal

Ethics-Aware Safe Reinforcement Learning for Rare-Event Risk Control in Interactive Urban Driving.

IEEE transactions on pattern analysis and machine intelligence·2026
Same journal

Learning Shape Anchors for Holistic Indoor Scene Understanding.

IEEE transactions on pattern analysis and machine intelligence·2026
See all related articles

Related Experiment Video

Updated: Sep 22, 2025

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment
08:25

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

Published on: May 7, 2019

9.1K

MMNet: A Model-Based Multimodal Network for Human Action Recognition in RGB-D Videos.

Bruce X B Yu, Yan Liu, Xiang Zhang

    IEEE Transactions on Pattern Analysis and Machine Intelligence
    |May 26, 2022
    PubMed
    Summary
    This summary is machine-generated.

    This study introduces a novel multimodal network (MMNet) for human action recognition (HAR) using RGB-D videos. MMNet effectively fuses skeleton and RGB data, outperforming existing methods on multiple datasets.

    More Related Videos

    Author Spotlight: Addressing Technical and Subjective Challenges in Measuring Classroom Attention
    06:37

    Author Spotlight: Addressing Technical and Subjective Challenges in Measuring Classroom Attention

    Published on: December 15, 2023

    4.2K
    A Step-by-Step Implementation of DeepBehavior, Deep Learning Toolbox for Automated Behavior Analysis
    05:41

    A Step-by-Step Implementation of DeepBehavior, Deep Learning Toolbox for Automated Behavior Analysis

    Published on: February 6, 2020

    9.5K

    Related Experiment Videos

    Last Updated: Sep 22, 2025

    Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment
    08:25

    Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

    Published on: May 7, 2019

    9.1K
    Author Spotlight: Addressing Technical and Subjective Challenges in Measuring Classroom Attention
    06:37

    Author Spotlight: Addressing Technical and Subjective Challenges in Measuring Classroom Attention

    Published on: December 15, 2023

    4.2K
    A Step-by-Step Implementation of DeepBehavior, Deep Learning Toolbox for Automated Behavior Analysis
    05:41

    A Step-by-Step Implementation of DeepBehavior, Deep Learning Toolbox for Automated Behavior Analysis

    Published on: February 6, 2020

    9.5K

    Area of Science:

    • Computer Vision
    • Artificial Intelligence
    • Machine Learning

    Background:

    • Human Action Recognition (HAR) in RGB-D videos is a growing field.
    • Unimodal approaches (skeleton-based, RGB-based) have advanced significantly.
    • Multimodal methods, especially model-level fusion, remain underexplored.

    Purpose of the Study:

    • To propose a model-based multimodal network (MMNet) for fusing skeleton and RGB data.
    • To enhance ensemble recognition accuracy by leveraging complementary information from different modalities.
    • To improve the discriminative power of features for HAR.

    Main Methods:

    • Developed a model-based multimodal network (MMNet).
    • Employed a spatiotemporal graph convolution network for skeleton data to learn attention weights.
    • Transferred learned attention weights from skeleton to RGB modality network.
    • Utilized model-level fusion to combine skeleton and RGB information.

    Main Results:

    • MMNet outperformed state-of-the-art approaches on six evaluation protocols across five benchmark datasets (NTU RGB+D 60/120, PKU-MMD, Northwestern-UCLA, Toyota Smarthome).
    • The method demonstrated consistent performance on the Kinetics 400 RGB video dataset, indicating robustness.
    • Achieved effective capture of complementary features between RGB and skeleton modalities.

    Conclusions:

    • The proposed MMNet effectively fuses skeleton and RGB modalities for improved HAR.
    • MMNet provides more discriminative features by capturing complementary information.
    • The model-level fusion approach offers a promising direction for multimodal HAR research.