TSGCNeXt: Dynamic-Static Multi-Graph Convolution for Efficient Skeleton-Based Action Recognition with Long-term Learning Potential

📅 2023-04-23
🏛️ arXiv.org
📈 Citations: 8
Influential: 1
📄 PDF
🤖 AI Summary
To address structural redundancy, inefficient dynamic graph learning, and insufficient long-term temporal modeling in GCN-based skeleton-based action recognition, this paper proposes a Dynamic-Static Separated Multi-Graph Convolution (DS-SMG) mechanism to decouple spatiotemporal dependency modeling. We further design a backpropagation acceleration strategy for graph convolution, achieving a 55.08% training speedup. Additionally, we introduce a three-module spatiotemporal learning architecture integrating Exponential Moving Average (EMA)-based multi-stream fusion and a lightweight ConvNeXt-style backbone. On the NTU-120 dataset, the single-stream model achieves state-of-the-art (SOTA) accuracy of 90.22% (cross-subject) and 91.74% (cross-setup), while the multi-stream variant attains industry-leading performance.
📝 Abstract
Skeleton-based action recognition has achieved remarkable results in human action recognition with the development of graph convolutional networks (GCNs). However, the recent works tend to construct complex learning mechanisms with redundant training and exist a bottleneck for long time-series. To solve these problems, we propose the Temporal-Spatio Graph ConvNeXt (TSGCNeXt) to explore efficient learning mechanism of long temporal skeleton sequences. Firstly, a new graph learning mechanism with simple structure, Dynamic-Static Separate Multi-graph Convolution (DS-SMG) is proposed to aggregate features of multiple independent topological graphs and avoid the node information being ignored during dynamic convolution. Next, we construct a graph convolution training acceleration mechanism to optimize the back-propagation computing of dynamic graph learning with 55.08% speed-up. Finally, the TSGCNeXt restructure the overall structure of GCN with three Spatio-temporal learning modules,efficiently modeling long temporal features. In comparison with existing previous methods on large-scale datasets NTU RGB+D 60 and 120, TSGCNeXt outperforms on single-stream networks. In addition, with the ema model introduced into the multi-stream fusion, TSGCNeXt achieves SOTA levels. On the cross-subject and cross-set of the NTU 120, accuracies reach 90.22% and 91.74%.
Problem

Research questions and friction points this paper is trying to address.

Efficient learning for long skeleton action sequences
Overcoming complex redundant training mechanisms
Addressing long time-series modeling bottlenecks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic-Static Separate Multi-graph Convolution mechanism
Graph convolution training acceleration mechanism
Three Spatio-temporal learning modules structure
🔎 Similar Papers
No similar papers found.
D
Dongjingdin Liu
China University of Mining and Technology (CUMT)
Pengpeng Chen
Pengpeng Chen
Beihang University
machine learningbig data
M
Miao Yao
China University of Mining and Technology (CUMT)
Y
Yijing Lu
China University of Mining and Technology (CUMT)
Z
Zijie Cai
China University of Mining and Technology (CUMT)
Yuxin Tian
Yuxin Tian
Ph.d Candidate, Sichuan University
Deep LearningMachine Learning