Vision-Based Human Action Recognition Using Skeleton Graph Neural Networks

Authors

  • Sajud Hamza Elinjulliparambil Pace University, United States. Author

DOI:

https://doi.org/10.63282/3050-9262.IJAIDSML-V5I3P116

Keywords:

Human Action Recognition (HAR), Skeleton-Based Action Recognition, Graph Neural Networks (Gnns), Self-Supervised Learning

Abstract

HAR is an essential computer vision task, which provides the machine with an automatic representation of human actions and human action classification on the basis of visual representations. Older methods of HAR are based on RGB or RGB-D videos, and, in spite of their success in controlled settings, they are prone to occlusions, variations in lighting, and a high level of computational complexity. HAR based on skeletons has been proposed as an alternative, a representation of human poses in terms of joints or keypoints and tracking both spatial and temporal behavior of movement. This abstraction enhances resiliency to visual noise, lowers computational costs, and supports real time usage. Graph Neural Networks (GNNs), and in particular Graph Convolutional Networks (GCNs), have proven to have a phenomenal success in skeleton-based HAR, by considering the human body as a spatial-temporal graph with joints representing nodes and bones or temporal relationships as edges. The paper provides a systematized review of the vision-based human action recognition based on skeleton GNNs through skeleton representation, graph construction and development, early and advanced GNN architecture, and how they are enhanced by adaptive graphs, attention, and hierarchical modeling. Some of the most critical issues, including skeleton noise, time mismatch, generalization across views, over-smoothing in deep GCNs, and computational complexity, are addressed, as well as solutions to these problems. Also, we emphasize surveillance, healthcare, human-computer interaction, and robotics. Lastly, the future research directions are suggested, which are multi-moding, lightweight models of edge devices, self-monitored learning, and cross-dataset generalization. The current review offers a solid background to researchers who want to come up with strong, effective, and understandable skeleton-based systems of HAR

References

[1] P. Pareek and A. Thakkar, “A survey on video-based human action recognition: Recent updates, datasets, challenges, and applications,” Artif. Intell. Rev., vol. 54, no. 3, pp. 2259–2322, 2021.

[2] A. Lentzas and D. Vrakas, “Non-intrusive human activity recognition and abnormal behavior detection on elderly people: A review,” Artif. Intell. Rev., vol. 53, no. 3, pp. 1975–2021, 2020.

[3] U. E. Ogenyi et al., “Physical human–robot collaboration: Robotic systems, learning methods, collaborative strategies, sensors, and actuators,” IEEE Trans. Cybern., vol. 51, no. 4, pp. 1888–1901, 2019.

[4] M. B. Shaikh and D. Chai, “RGB-D data-based action recognition: A review,” Sensors, vol. 21, no. 12, Art. no. 4246, 2021.

[5] R. Abdulghafor, S. Turaev, and M. A. H. Ali, “Body language analysis in healthcare: An overview,” Healthcare, vol. 10, no. 7, 2022.

[6] Z. Wang et al., “Skeleton-based human pose recognition using channel state information: A survey,” Sensors, vol. 22, no. 22, Art. no. 8738, 2022.

[7] S. Ghidoni and M. Munaro, “A multi-viewpoint feature-based re-identification system driven by skeleton keypoints,” Robot. Auton. Syst., vol. 90, pp. 45–54, 2017.

[8] N. Baka et al., “2D–3D shape reconstruction of the distal femur from stereo X-ray imaging using statistical shape models,” Med. Image Anal., vol. 15, no. 6, pp. 840–850, 2011.

[9] M. Feng and J. Meunier, “Skeleton graph-neural-network-based human action recognition: A survey,” Sensors, vol. 22, no. 6, Art. no. 2091, 2022.

[10] L. Shi et al., “Skeleton-based action recognition with directed graph neural networks,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Long Beach, CA, USA, 2019, pp. 7912–7921.

[11] F. Wu et al., “Simplifying graph convolutional networks,” in Proc. Int. Conf. Mach. Learn. (ICML), Long Beach, CA, USA, 2019, pp. 6861–6871.

[12] Y. Hu et al., “Graph-MLP: Node classification without message passing in graph,” arXiv preprint, arXiv:2106.04051, 2021.

[13] P. V. V. Kishore et al., “Spatial joint features for 3D human skeletal action recognition system using spatial graph kernels,” Int. J. Eng. Technol., vol. 7, pp. 489–493, 2018.

[14] D. Jani et al., “Repositioning the knee joint in human body FE models using a graphics-based technique,” Traffic Injury Prevention, vol. 13, no. 6, pp. 640–649, 2012.

[15] S. Yan, Y. Xiong, and D. Lin, “Spatial temporal graph convolutional networks for skeleton-based action recognition,” in Proc. AAAI Conf. Artif. Intell., vol. 32, no. 1, 2018, pp. 7444–7452.

[16] Q. Wang, K. Zhang, and M. A. Asghar, “Skeleton-based ST-GCN for human action recognition with extended skeleton graph and partitioning strategy,” IEEE Access, vol. 10, pp. 41403–41410, 2022.

[17] X. Ding, K. Yang, and W. Chen, “An attention-enhanced recurrent graph convolutional network for skeleton-based action recognition,” in Proc. 2nd Int. Conf. Signal Process. Mach. Learn., 2019, pp. 1–6.

[18] L. Dang et al., “MSR-GCN: Multi-scale residual graph convolution networks for human motion prediction,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Montreal, QC, Canada, 2021, pp. 11467–11476.

[19] S. Sarker et al., “Skeleton-based activity recognition: Preprocessing and approaches,” in Contactless Human Activity Analysis, Cham, Switzerland: Springer, 2021, pp. 43–81.

[20] S. Song et al., “Learning to recognize human actions from noisy skeleton data via noise adaptation,” IEEE Trans. Multimedia, vol. 24, pp. 1152–1163, 2021.

[21] M. Kan, S. Shan, and X. Chen, “Multi-view deep network for cross-view classification,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Las Vegas, NV, USA, 2016, pp. 4847–4856.

[22] F. Angelini, Novel Methods for Posture-Based Human Action Recognition and Activity Anomaly Detection, Ph.D. dissertation, Newcastle Univ., Newcastle upon Tyne, U.K., 2020.

[23] M. S. Kibbanahalli Shivalingappa and M. Swamy, “Real-time human action and gesture recognition using skeleton joints information towards medical applications,” 2020.

[24] Y. Ji et al., “A survey of human action analysis in HRI applications,” IEEE Trans. Circuits Syst. Video Technol., vol. 30, no. 7, pp. 2114–2128, 2019.

Published

2024-10-30

Issue

Section

Articles

How to Cite

1.
Elinjulliparambil SH. Vision-Based Human Action Recognition Using Skeleton Graph Neural Networks. IJAIDSML [Internet]. 2024 Oct. 30 [cited 2026 Mar. 9];5(3):148-56. Available from: https://ijaidsml.org/index.php/ijaidsml/article/view/380