PUMPS: Skeleton-Agnostic Point-based Universal Motion Pre-Training for Synthesis in Human Motion Tasks

📅 2025-07-27

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Addressing the challenges of cross-skeleton motion transfer and modeling temporal inconsistencies and point indistinguishability in temporal point clouds (TPCs), this paper proposes PUMPS—the first general-purpose motion pretraining framework for sequential point clouds. Methodologically, it introduces: (1) a learnable, skeleton-agnostic feature representation based on Gaussian-noise identifiers; (2) a lightweight linear assignment strategy for point correspondence, replacing computationally expensive point-wise attention mechanisms; and (3) a frame-level point cloud encoder–latent-space decoder architecture, jointly optimized via self-supervised pretraining and task-specific fine-tuning. Without requiring native labeled data, PUMPS achieves state-of-the-art performance directly after pretraining. After fine-tuning, it consistently outperforms dedicated methods on downstream tasks—including motion denoising and action estimation—demonstrating superior generalizability and structural universality across diverse skeletal configurations and motion dynamics.

Technology Category

Application Category

📝 Abstract

Motion skeletons drive 3D character animation by transforming bone hierarchies, but differences in proportions or structure make motion data hard to transfer across skeletons, posing challenges for data-driven motion synthesis. Temporal Point Clouds (TPCs) offer an unstructured, cross-compatible motion representation. Though reversible with skeletons, TPCs mainly serve for compatibility, not for direct motion task learning. Doing so would require data synthesis capabilities for the TPC format, which presents unexplored challenges regarding its unique temporal consistency and point identifiability. Therefore, we propose PUMPS, the primordial autoencoder architecture for TPC data. PUMPS independently reduces frame-wise point clouds into sampleable feature vectors, from which a decoder extracts distinct temporal points using latent Gaussian noise vectors as sampling identifiers. We introduce linear assignment-based point pairing to optimise the TPC reconstruction process, and negate the use of expensive point-wise attention mechanisms in the architecture. Using these latent features, we pre-train a motion synthesis model capable of performing motion prediction, transition generation, and keyframe interpolation. For these pre-training tasks, PUMPS performs remarkably well even without native dataset supervision, matching state-of-the-art performance. When fine-tuned for motion denoising or estimation, PUMPS outperforms many respective methods without deviating from its generalist architecture.

Problem

Research questions and friction points this paper is trying to address.

Overcoming skeleton differences for cross-compatible motion synthesis

Enabling direct motion task learning with Temporal Point Clouds

Addressing temporal consistency in point-based motion representation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Autoencoder for Temporal Point Clouds data

Linear assignment optimizes reconstruction

Latent features enable unsupervised pre-training

🔎 Similar Papers

No similar papers found.

Authors to Follow