Diffusion Model-based Activity Completion for AI Motion Capture from Videos

📅 2025-05-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing AI-based motion capture methods rely solely on observed video input and struggle to synthesize unseen motion segments. To address this, we propose a diffusion-based motion completion framework tailored for virtual human applications, enabling smooth, temporally coherent cross-segment motion synthesis. Our key contributions are: (1) the first diffusion architecture integrating a gating mechanism with joint spatiotemporal (position–time) embedding; (2) robust generation of missing transitional frames and long-sequence coherent modeling; and (3) the first approach to invert physically grounded sensor signals—specifically acceleration and angular velocity—from generated motion sequences. Evaluated on Human3.6M, our method achieves state-of-the-art performance in Average Displacement Error (ADE), Final Displacement Error (FDE), and Mean Motion ADE (MMADE), while employing only 16.84M parameters—41% fewer than HumanMAC—demonstrating significant improvements in motion naturalness and generation efficiency.

Technology Category

Application Category

📝 Abstract
AI-based motion capture is an emerging technology that offers a cost-effective alternative to traditional motion capture systems. However, current AI motion capture methods rely entirely on observed video sequences, similar to conventional motion capture. This means that all human actions must be predefined, and movements outside the observed sequences are not possible. To address this limitation, we aim to apply AI motion capture to virtual humans, where flexible actions beyond the observed sequences are required. We assume that while many action fragments exist in the training data, the transitions between them may be missing. To bridge these gaps, we propose a diffusion-model-based action completion technique that generates complementary human motion sequences, ensuring smooth and continuous movements. By introducing a gate module and a position-time embedding module, our approach achieves competitive results on the Human3.6M dataset. Our experimental results show that (1) MDC-Net outperforms existing methods in ADE, FDE, and MMADE but is slightly less accurate in MMFDE, (2) MDC-Net has a smaller model size (16.84M) compared to HumanMAC (28.40M), and (3) MDC-Net generates more natural and coherent motion sequences. Additionally, we propose a method for extracting sensor data, including acceleration and angular velocity, from human motion sequences.
Problem

Research questions and friction points this paper is trying to address.

AI motion capture lacks flexibility for unobserved human actions
Missing transitions between action fragments in training data
Generate smooth motion sequences using diffusion models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Diffusion-model-based action completion technique
Gate module and position-time embedding module
Extracts sensor data from motion sequences
🔎 Similar Papers
No similar papers found.
G
Gao Huayu
Kyushu Institute of Technology
T
Tengjiu Huang
Kyushu Institute of Technology
X
Xiaolong Ye
Kyushu Institute of Technology
Tsuyoshi Okita
Tsuyoshi Okita
Kyushu Institute of Technology
Generative AIDeep LearningArtificial IntelligenceIoTNatural Language Processing