Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Diffusion-augmented YOLO26-Swin cascaded framework with hybrid SHAP-CAM for autonomous power grid inspection.

Autonomous intelligent systems·2026
Same author

Deep Learning Approach for Automatic Heartbeat Classification.

Sensors (Basel, Switzerland)·2025
Same author

Audio-Based Engine Fault Diagnosis with Wavelet, Markov Blanket, ROCKET, and Optimized Machine Learning Classifiers.

Sensors (Basel, Switzerland)·2024
Same author

Random Convolutional Kernel Transform with Empirical Mode Decomposition for Classification of Insulators from Power Grid.

Sensors (Basel, Switzerland)·2024
Same author

Decoding Electroencephalography Signal Response by Stacking Ensemble Learning and Adaptive Differential Evolution.

Sensors (Basel, Switzerland)·2023
Same author

Group Method of Data Handling Using Christiano-Fitzgerald Random Walk Filter for Insulator Fault Prediction.

Sensors (Basel, Switzerland)·2023
Same journal

RETRACTED: Zhang et al. A Novel Framework for Reconstruction and Imaging of Target Scattering Centers via Wide-Angle Incidence in Radar Networks. <i>Sensors</i> 2025, <i>25</i>, 6802.

Sensors (Basel, Switzerland)·2026
Same journal

Enhancing Unsupervised Multi-Source Domain Adaptation for Person Re-Identification via Mixture of Experts and Graph-Based Relation.

Sensors (Basel, Switzerland)·2026
Same journal

Development of an Instrumented Glove for Palmar Pressure Assessment in Kayakers.

Sensors (Basel, Switzerland)·2026
Same journal

Development and Experimental Validation of an Autonomous IoT-Based Monitoring System for Real-Time Water Quality Assessment in the Amazon River.

Sensors (Basel, Switzerland)·2026
Same journal

Semi-Supervised Adversarial Learning Framework for Controller Area Network Bus Intrusion Detection.

Sensors (Basel, Switzerland)·2026
Same journal

Smart Optimization Method for Safety Signs in Innovative Manufacturing Environments Integrating Industrial Field IoT Sensors and Knowledge Graphs.

Sensors (Basel, Switzerland)·2026
See all related articles

Related Experiment Video

Updated: Jul 20, 2025

Author Spotlight: Addressing Technical and Subjective Challenges in Measuring Classroom Attention
06:37

Author Spotlight: Addressing Technical and Subjective Challenges in Measuring Classroom Attention

Published on: December 15, 2023

3.8K

Video-Based Human Activity Recognition Using Deep Learning Approaches.

Guilherme Augusto Silva Surek1, Laio Oriel Seman2, Stefano Frizzo Stefenon3,4

  • 1Industrial and Systems Engineering Graduate Program (PPGEPS), Pontifical Catholic University of Parana (PUCPR), Curitiba 80215-901, Brazil.

Sensors (Basel, Switzerland)
|July 29, 2023
PubMed
Summary
This summary is machine-generated.

This study enhances human activity recognition using deep learning models like Vision Transformer (ViT) and Residual Network (ResNet) with self-supervised learning. The ViT architecture shows promising results for complex action recognition in videos.

Keywords:
convolutional neural networkdeep learningself-DIstillation with NO labels (DINO)video human action recognitionvision transformer architecture

More Related Videos

A Step-by-Step Implementation of DeepBehavior, Deep Learning Toolbox for Automated Behavior Analysis
05:41

A Step-by-Step Implementation of DeepBehavior, Deep Learning Toolbox for Automated Behavior Analysis

Published on: February 6, 2020

9.5K
Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment
08:25

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

Published on: May 7, 2019

9.0K

Related Experiment Videos

Last Updated: Jul 20, 2025

Author Spotlight: Addressing Technical and Subjective Challenges in Measuring Classroom Attention
06:37

Author Spotlight: Addressing Technical and Subjective Challenges in Measuring Classroom Attention

Published on: December 15, 2023

3.8K
A Step-by-Step Implementation of DeepBehavior, Deep Learning Toolbox for Automated Behavior Analysis
05:41

A Step-by-Step Implementation of DeepBehavior, Deep Learning Toolbox for Automated Behavior Analysis

Published on: February 6, 2020

9.5K
Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment
08:25

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

Published on: May 7, 2019

9.0K

Area of Science:

  • Computer Science
  • Artificial Intelligence
  • Machine Learning

Background:

  • Human activity recognition (HAR) is crucial for analyzing human behavior using sensor data.
  • Recognizing actions in videos with multiple interacting entities requires advanced spatial modeling.
  • Deep learning models offer powerful tools for visual reasoning in action recognition tasks.

Purpose of the Study:

  • To evaluate and map the current state of human action recognition in RGB videos using deep learning.
  • To assess the performance of Residual Network (ResNet) and Vision Transformer (ViT) architectures.
  • To investigate the impact of semi-supervised learning and DINO (self-DIstillation with NO labels) on HAR.

Main Methods:

  • Implemented and evaluated ResNet and ViT architectures with a semi-supervised learning approach.
  • Utilized DINO (self-DIstillation with NO labels) to enhance model capabilities.
  • Tested models on the Human Motion Database (HMDB51) benchmark for action recognition.

Main Results:

  • The Vision Transformer (ViT) architecture demonstrated promising performance in video classification.
  • A bi-dimensional ViT combined with Long Short-Term Memory (LSTM) achieved high accuracy on the HMDB51 dataset.
  • The ViT-LSTM model achieved 96.7 ± 0.35% accuracy in training and 41.0 ± 0.27% in testing phases.

Conclusions:

  • Deep learning models, particularly Vision Transformers, show significant potential for complex human action recognition.
  • Semi-supervised learning and DINO enhance the effectiveness of HAR models.
  • The proposed ViT-LSTM architecture provides a robust solution for video-based human action recognition.