Spatiotemporal-Untrammelled Mixture of Experts for Multi-Person Motion Prediction

📅 2025-12-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the rigidity in spatiotemporal dependency modeling—constrained by positional encoding—and the high computational cost—due to the quadratic complexity of standard attention—in multi-person motion prediction, this paper proposes a positional-encoding-free, low-complexity Mixture-of-Experts (MoE) architecture. Methodologically, it introduces four heterogeneous spatiotemporal experts operating under a novel collaborative mechanism; employs bidirectional spatiotemporal Mamba as a lightweight expert to jointly enable parameter sharing and dynamic sparse routing; and adopts spatiotemporal decoupled modeling with position-free sequence representation. Evaluated on four benchmark datasets, the method achieves state-of-the-art accuracy while reducing model parameters by 41.38% and accelerating training by 3.6× compared to prior approaches.

Technology Category

Application Category

📝 Abstract
Comprehensively and flexibly capturing the complex spatio-temporal dependencies of human motion is critical for multi-person motion prediction. Existing methods grapple with two primary limitations: i) Inflexible spatiotemporal representation due to reliance on positional encodings for capturing spatiotemporal information. ii) High computational costs stemming from the quadratic time complexity of conventional attention mechanisms. To overcome these limitations, we propose the Spatiotemporal-Untrammelled Mixture of Experts (ST-MoE), which flexibly explores complex spatio-temporal dependencies in human motion and significantly reduces computational cost. To adaptively mine complex spatio-temporal patterns from human motion, our model incorporates four distinct types of spatiotemporal experts, each specializing in capturing different spatial or temporal dependencies. To reduce the potential computational overhead while integrating multiple experts, we introduce bidirectional spatiotemporal Mamba as experts, each sharing bidirectional temporal and spatial Mamba in distinct combinations to achieve model efficiency and parameter economy. Extensive experiments on four multi-person benchmark datasets demonstrate that our approach not only outperforms state-of-art in accuracy but also reduces model parameter by 41.38% and achieves a 3.6x speedup in training. The code is available at https://github.com/alanyz106/ST-MoE.
Problem

Research questions and friction points this paper is trying to address.

Captures complex spatio-temporal dependencies in human motion
Reduces high computational costs of existing methods
Improves accuracy while decreasing model parameters and training time
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses bidirectional spatiotemporal Mamba as experts
Incorporates four distinct spatiotemporal experts
Reduces computational cost and model parameters
🔎 Similar Papers
No similar papers found.
Z
Zheng Yin
Nanjing University of Science and Technology
C
Chengjian Li
Nanjing University of Science and Technology
X
Xiangbo Shu
Nanjing University of Science and Technology
Meiqi Cao
Meiqi Cao
Nanjing University Of Science And Technology
R
Rui Yan
Nanjing University of Science and Technology
J
Jinhui Tang
Nanjing Forestry University