Ensuring the safety and well-being of elderly and vulnerable people in assisted living environments is a critical concern. Computer vision presents an innovative approach to predicting health risks through video monitoring, employing human action recognition (HAR) technology. However, real-time prediction of human actions with high performance and efficiency is a challenge. This research proposes a real-time HAR model that combines a deep learning model and a live video prediction and alert system, to predict falls, staggering and chest pain for residents in assisted living. Six thousand RGB video samples from the NTU RGB+D 60 dataset were selected to create a dataset with four classes: Falling, Staggering, Chest Pain, and Normal, which comprises 40 daily actions. Four state-of-the-art HAR models, namely UniFormerV2, TimeSformer, I3D, and SlowFast, were trained in a total of six variants on a GPU using transfer learning. Results are presented based on class-wise and macro performance metrics, inference efficiency, model complexity and computational cost. The optimal model, TimeSformer, achieved a macro F1 score of 95.33%, a macro recall of 95.49%, and a macro precision of 95.19%, with superior inference throughput, utilized in the design of a real-time HAR model architecture. This research provides insights for real-time prediction of health risks in assisted living, enhancing safety, sustainable care, and smart communities. |
*** Title, author list and abstract as submitted during Camera-Ready version delivery. Small changes that may have occurred during processing by Springer may not appear in this window.