SMamDiff: Spatial Mamba for Stochastic Human Motion Prediction

📅 2025-11-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge of simultaneously ensuring spatiotemporal coherence, kinematic plausibility, and computational efficiency in single-stage diffusion-based human motion prediction (HMP), this paper introduces a residual discrete cosine transform (DCT) motion encoder and a skeleton-drawing spatial Mamba module. The former enhances high-frequency motion feature modeling, while the latter explicitly captures long-range inter-joint dependencies and anatomical joint-order constraints. Built upon a diffusion framework, our method integrates DCT-based motion representation, Mamba’s efficient sequence modeling, and biomechanics-informed kinematic priors to enable end-to-end, diverse, and physically plausible pose generation. Evaluated on Human3.6M and HumanEva, it achieves state-of-the-art performance among single-stage probabilistic methods, reduces inference latency by 42%, and cuts memory footprint by 38%. Its lightweight design enables deployment on edge devices, making it suitable for real-time intelligent perception applications such as service robotics.

Technology Category

Application Category

📝 Abstract
With intelligent room-side sensing and service robots widely deployed, human motion prediction (HMP) is essential for safe, proactive assistance. However, many existing HMP methods either produce a single, deterministic forecast that ignores uncertainty or rely on probabilistic models that sacrifice kinematic plausibility. Diffusion models improve the accuracy-diversity trade-off but often depend on multi-stage pipelines that are costly for edge deployment. This work focuses on how to ensure spatial-temporal coherence within a single-stage diffusion model for HMP. We introduce SMamDiff, a Spatial Mamba-based Diffusion model with two novel designs: (i) a residual-DCT motion encoding that subtracts the last observed pose before a temporal DCT, reducing the first DC component ($f=0$) dominance and highlighting informative higher-frequency cues so the model learns how joints move rather than where they are; and (ii) a stickman-drawing spatial-mamba module that processes joints in an ordered, joint-by-joint manner, making later joints condition on earlier ones to induce long-range, cross-joint dependencies. On Human3.6M and HumanEva, these coherence mechanisms deliver state-of-the-art results among single-stage probabilistic HMP methods while using less latency and memory than multi-stage diffusion baselines.
Problem

Research questions and friction points this paper is trying to address.

Ensuring spatial-temporal coherence in single-stage diffusion models
Improving accuracy-diversity trade-off in human motion prediction
Reducing latency and memory for edge deployment
Innovation

Methods, ideas, or system contributions that make the work stand out.

Residual-DCT motion encoding reduces DC dominance
Stickman-drawing spatial-mamba induces cross-joint dependencies
Single-stage diffusion model ensures spatiotemporal coherence efficiently
🔎 Similar Papers
No similar papers found.
Junqiao Fan
Junqiao Fan
School of Electrical and Electronic Engineering, Nanyang Technological University
Radar SensingHuman SensingHealthcare AI
P
Pengfei Liu
School of Mechanical and Aerospace Engineering, Nanyang Technological University
H
Haocong Rao
School of Computer Science, Nanyang Technological University