How improper dataset split hinders model generalizability: a systematic comparison in Human activity recognition and exercise evaluation tasks | JoVE Visualize

Area of Science:

Artificial Intelligence
Machine Learning
Computer Vision
Healthcare Technology

Background:

Human Activity Recognition (HAR) and exercise assessment models are vital for healthcare applications like clinical evaluation and remote monitoring.
Real-world applicability hinges on generalizing to diverse subjects, yet many studies use non-cross-subject (NCS) splits, inflating performance estimates.
This practice risks misleading clinical trust due to inter-individual variability in movement patterns.

Purpose of the Study:

Investigate the impact of non-cross-subject (NCS) versus cross-subject (CS) data splits on machine learning and deep learning model performance.
Analyze how data splitting strategies and training-test set differences influence predictive variance and model stability.
Assess performance across tasks of varying complexity in Human Activity Recognition.

Main Methods:

Experiments utilized the large-scale NTU RGB+D 120 and IntelliRehabDS datasets for HAR and rehabilitation tasks.
Evaluated 12 machine learning and deep learning models using a simulation-based approach to compare NCS and CS split performance.
Employed predictive variance decomposition via Generalized Linear Mixed-Effects models to link split strategy to model stability.

Main Results:

Non-cross-subject (NCS) splits consistently overestimated model performance, especially with increasing task and model complexity.
Deep learning architectures showed significantly higher NCS performance compared to CS splits.
Greater subject differences between training and test sets increased predictive instability, while CS splits promoted more generalizable representations.

Conclusions:

Incorrect dataset splits can exaggerate generalization capabilities and undermine trust in AI models for rehabilitation and healthcare.
This study offers empirical evidence and methodological guidance for robust evaluation of computer vision-based rehabilitation models.
Promoting reproducible and trustworthy AI deployment is essential for broader healthcare applications.