Are Spatial-Temporal Graph Convolution Networks for Human Action Recognition Over-Parameterized?

📅 2025-05-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates whether spatial-temporal graph convolutional networks (ST-GCNs) for skeleton-based action recognition suffer from over-parameterization. Grounded in the lottery ticket hypothesis, we provide the first empirical evidence of substantial parameter redundancy in ST-GCNs. To address this, we propose a trainable sparse structure generator enabling end-to-end structured pruning. Furthermore, we design a multi-level sparse ensemble architecture that jointly optimizes sparsity across heterogeneous granularities. Experiments demonstrate that the optimal sparse model retains only 5% of the original parameters while incurring less than 1% top-1 accuracy degradation. Moreover, the multi-level ensemble—using merely 66% of the original parameters—achieves an average top-1 accuracy improvement exceeding 1% on mainstream benchmarks including NTU and Kinetics. These results confirm that sparsification not only compresses models but also enhances generalization performance.

Technology Category

Application Category

📝 Abstract
Spatial-temporal graph convolutional networks (ST-GCNs) showcase impressive performance in skeleton-based human action recognition (HAR). However, despite the development of numerous models, their recognition performance does not differ significantly after aligning the input settings. With this observation, we hypothesize that ST-GCNs are over-parameterized for HAR, a conjecture subsequently confirmed through experiments employing the lottery ticket hypothesis. Additionally, a novel sparse ST-GCNs generator is proposed, which trains a sparse architecture from a randomly initialized dense network while maintaining comparable performance levels to the dense components. Moreover, we generate multi-level sparsity ST-GCNs by integrating sparse structures at various sparsity levels and demonstrate that the assembled model yields a significant enhancement in HAR performance. Thorough experiments on four datasets, including NTU-RGB+D 60(120), Kinetics-400, and FineGYM, demonstrate that the proposed sparse ST-GCNs can achieve comparable performance to their dense components. Even with 95% fewer parameters, the sparse ST-GCNs exhibit a degradation of<1% in top-1 accuracy. Meanwhile, the multi-level sparsity ST-GCNs, which require only 66% of the parameters of the dense ST-GCNs, demonstrate an improvement of>1% in top-1 accuracy. The code is available at https://github.com/davelailai/Sparse-ST-GCN.
Problem

Research questions and friction points this paper is trying to address.

Investigates over-parameterization in ST-GCNs for action recognition
Proposes sparse ST-GCNs generator to reduce redundant parameters
Demonstrates multi-level sparsity ST-GCNs improve recognition performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses lottery ticket hypothesis for ST-GCN sparsity
Proposes sparse ST-GCN generator from dense networks
Integrates multi-level sparsity for enhanced HAR performance
🔎 Similar Papers
No similar papers found.