LSTC-MDA: A Unified Framework for Long-Short Term Temporal Convolution and Mixed Data Augmentation in Skeleton-Based Action Recognition

📅 2025-09-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Skeleton-based action recognition has long suffered from scarce labeled data and challenges in modeling short- and long-term temporal dependencies. To address these issues, this paper proposes LSTC-MDA, a unified framework featuring two core innovations: (1) a Long-Short-Term Parallel Temporal Convolution (LSTC) module that adaptively fuses multi-scale temporal features; and (2) a view-aware hybrid data augmentation strategy, integrating joint-level mixed data augmentation (JMDA) with input-layer Additive Mixup, enhanced by a similarity-weighted alignment mechanism to mitigate distribution shift caused by cross-view mixing. Evaluated on NTU-60, NTU-120, and NW-UCLA benchmarks, LSTC-MDA achieves state-of-the-art accuracies of 94.1%, 90.4%, and 97.2%, respectively—outperforming all existing methods. The framework establishes a novel paradigm for few-shot temporal modeling in skeleton-based action recognition.

Technology Category

Application Category

📝 Abstract
Skeleton-based action recognition faces two longstanding challenges: the scarcity of labeled training samples and difficulty modeling short- and long-range temporal dependencies. To address these issues, we propose a unified framework, LSTC-MDA, which simultaneously improves temporal modeling and data diversity. We introduce a novel Long-Short Term Temporal Convolution (LSTC) module with parallel short- and long-term branches, these two feature branches are then aligned and fused adaptively using learned similarity weights to preserve critical long-range cues lost by conventional stride-2 temporal convolutions. We also extend Joint Mixing Data Augmentation (JMDA) with an Additive Mixup at the input level, diversifying training samples and restricting mixup operations to the same camera view to avoid distribution shifts. Ablation studies confirm each component contributes. LSTC-MDA achieves state-of-the-art results: 94.1% and 97.5% on NTU 60 (X-Sub and X-View), 90.4% and 92.0% on NTU 120 (X-Sub and X-Set),97.2% on NW-UCLA. Code: https://github.com/xiaobaoxia/LSTC-MDA.
Problem

Research questions and friction points this paper is trying to address.

Addressing scarcity of labeled training samples
Modeling short- and long-range temporal dependencies
Improving temporal modeling and data diversity
Innovation

Methods, ideas, or system contributions that make the work stand out.

Long-Short Term Temporal Convolution module
Adaptive fusion with learned similarity weights
Joint Mixing Data Augmentation with Additive Mixup
🔎 Similar Papers
No similar papers found.
Feng Ding
Feng Ding
Suzhou Laboratory
PhysicsChemistryMaterial Science
Haisheng Fu
Haisheng Fu
The University of British Columbia, Postdoctoral Fellow
Deep LearingImage CompressionVideo CompressionIC Design,Hardware Implementation,Cryptography
S
Soroush Oraki
School of Engineering Science, Simon Fraser University, BC, Canada
J
Jie Liang
School of Engineering Science, Simon Fraser University, BC, Canada