Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Diffusion-augmented YOLO26-Swin cascaded framework with hybrid SHAP-CAM for autonomous power grid inspection.

Autonomous intelligent systems·2026

Same author

Deep Learning Approach for Automatic Heartbeat Classification.

Sensors (Basel, Switzerland)·2025

Same author

Audio-Based Engine Fault Diagnosis with Wavelet, Markov Blanket, ROCKET, and Optimized Machine Learning Classifiers.

Sensors (Basel, Switzerland)·2024

Same author

Random Convolutional Kernel Transform with Empirical Mode Decomposition for Classification of Insulators from Power Grid.

Sensors (Basel, Switzerland)·2024

Same author

Decoding Electroencephalography Signal Response by Stacking Ensemble Learning and Adaptive Differential Evolution.

Sensors (Basel, Switzerland)·2023

Same author

Group Method of Data Handling Using Christiano-Fitzgerald Random Walk Filter for Insulator Fault Prediction.

Sensors (Basel, Switzerland)·2023

Same journal

RETRACTED: Zhang et al. A Novel Framework for Reconstruction and Imaging of Target Scattering Centers via Wide-Angle Incidence in Radar Networks. <i>Sensors</i> 2025, <i>25</i>, 6802.

Sensors (Basel, Switzerland)·2026

Same journal

Enhancing Unsupervised Multi-Source Domain Adaptation for Person Re-Identification via Mixture of Experts and Graph-Based Relation.

Sensors (Basel, Switzerland)·2026

Same journal

Development of an Instrumented Glove for Palmar Pressure Assessment in Kayakers.

Sensors (Basel, Switzerland)·2026

Same journal

Development and Experimental Validation of an Autonomous IoT-Based Monitoring System for Real-Time Water Quality Assessment in the Amazon River.

Sensors (Basel, Switzerland)·2026

Same journal

Semi-Supervised Adversarial Learning Framework for Controller Area Network Bus Intrusion Detection.

Sensors (Basel, Switzerland)·2026

Same journal

Smart Optimization Method for Safety Signs in Innovative Manufacturing Environments Integrating Industrial Field IoT Sensors and Knowledge Graphs.

Sensors (Basel, Switzerland)·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Jul 20, 2025

Author Spotlight: Addressing Technical and Subjective Challenges in Measuring Classroom Attention

Author Spotlight: Addressing Technical and Subjective Challenges in Measuring Classroom Attention

Published on: December 15, 2023

Video-Based Human Activity Recognition Using Deep Learning Approaches.

Guilherme Augusto Silva Surek¹, Laio Oriel Seman², Stefano Frizzo Stefenon^3,4

¹Industrial and Systems Engineering Graduate Program (PPGEPS), Pontifical Catholic University of Parana (PUCPR), Curitiba 80215-901, Brazil.

Sensors (Basel, Switzerland)

|July 29, 2023

Summary

This summary is machine-generated.

This study enhances human activity recognition using deep learning models like Vision Transformer (ViT) and Residual Network (ResNet) with self-supervised learning. The ViT architecture shows promising results for complex action recognition in videos.

Keywords:

convolutional neural network deep learning self-DIstillation with NO labels (DINO)video human action recognition vision transformer architecture

More Related Videos

A Step-by-Step Implementation of DeepBehavior, Deep Learning Toolbox for Automated Behavior Analysis

A Step-by-Step Implementation of DeepBehavior, Deep Learning Toolbox for Automated Behavior Analysis

Published on: February 6, 2020

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

Published on: May 7, 2019

Related Experiment Videos

Last Updated: Jul 20, 2025

Author Spotlight: Addressing Technical and Subjective Challenges in Measuring Classroom Attention

Author Spotlight: Addressing Technical and Subjective Challenges in Measuring Classroom Attention

Published on: December 15, 2023

A Step-by-Step Implementation of DeepBehavior, Deep Learning Toolbox for Automated Behavior Analysis

A Step-by-Step Implementation of DeepBehavior, Deep Learning Toolbox for Automated Behavior Analysis

Published on: February 6, 2020

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

Published on: May 7, 2019

Area of Science:

Computer Science
Artificial Intelligence
Machine Learning

Background:

Human activity recognition (HAR) is crucial for analyzing human behavior using sensor data.
Recognizing actions in videos with multiple interacting entities requires advanced spatial modeling.
Deep learning models offer powerful tools for visual reasoning in action recognition tasks.

Purpose of the Study:

To evaluate and map the current state of human action recognition in RGB videos using deep learning.
To assess the performance of Residual Network (ResNet) and Vision Transformer (ViT) architectures.
To investigate the impact of semi-supervised learning and DINO (self-DIstillation with NO labels) on HAR.

Main Methods:

Implemented and evaluated ResNet and ViT architectures with a semi-supervised learning approach.
Utilized DINO (self-DIstillation with NO labels) to enhance model capabilities.
Tested models on the Human Motion Database (HMDB51) benchmark for action recognition.

Main Results:

The Vision Transformer (ViT) architecture demonstrated promising performance in video classification.
A bi-dimensional ViT combined with Long Short-Term Memory (LSTM) achieved high accuracy on the HMDB51 dataset.
The ViT-LSTM model achieved 96.7 ± 0.35% accuracy in training and 41.0 ± 0.27% in testing phases.

Conclusions:

Deep learning models, particularly Vision Transformers, show significant potential for complex human action recognition.
Semi-supervised learning and DINO enhance the effectiveness of HAR models.
The proposed ViT-LSTM architecture provides a robust solution for video-based human action recognition.