Autoregressive Adaptive Hypergraph Transformer for Skeleton-based Activity Recognition

📅 2024-11-08
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenges of insufficient multi-scale contextual modeling and difficulty in capturing high-order joint correlations in skeleton-based action recognition, this paper proposes an autoregressive vector-quantized hypergraph learning framework. Methodologically, it introduces a novel autoregressive vectorized hypergraph generation mechanism, integrated with model-agnostic adaptive hyperedge construction, to enable joint spatio-temporal-channel feature learning. A Transformer-based architecture is incorporated to enhance long-range dependency modeling, and a supervised-unsupervised hybrid training paradigm is designed to jointly optimize action-specific representations across spatial, temporal, and channel dimensions. The framework achieves state-of-the-art performance on NTU RGB+D, NTU RGB+D 120, and NW-UCLA benchmarks, significantly outperforming existing hypergraph-based approaches. Ablation studies confirm the effectiveness and complementary nature of each component.

Technology Category

Application Category

📝 Abstract
Extracting multiscale contextual information and higher-order correlations among skeleton sequences using Graph Convolutional Networks (GCNs) alone is inadequate for effective action classification. Hypergraph convolution addresses the above issues but cannot harness the long-range dependencies. The transformer proves to be effective in capturing these dependencies and making complex contextual features accessible. We propose an Autoregressive Adaptive HyperGraph Transformer (AutoregAd-HGformer) model for in-phase (autoregressive and discrete) and out-phase (adaptive) hypergraph generation. The vector quantized in-phase hypergraph equipped with powerful autoregressive learned priors produces a more robust and informative representation suitable for hyperedge formation. The out-phase hypergraph generator provides a model-agnostic hyperedge learning technique to align the attributes with input skeleton embedding. The hybrid (supervised and unsupervised) learning in AutoregAd-HGformer explores the action-dependent feature along spatial, temporal, and channel dimensions. The extensive experimental results and ablation study indicate the superiority of our model over state-of-the-art hypergraph architectures on the NTU RGB+D, NTU RGB+D 120, and NW-UCLA datasets.
Problem

Research questions and friction points this paper is trying to address.

Enhances skeleton-based activity recognition by capturing multiscale contextual information.
Addresses limitations of GCNs and hypergraph convolution in capturing long-range dependencies.
Proposes a hybrid model combining autoregressive and adaptive hypergraph generation techniques.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Autoregressive Adaptive HyperGraph Transformer model
Vector quantized in-phase hypergraph generation
Model-agnostic hyperedge learning technique
🔎 Similar Papers
No similar papers found.