Motion Marionette: Rethinking Rigid Motion Transfer via Prior Guidance

📅 2025-11-24

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Existing rigid motion transfer methods rely on geometric, generative, or physical priors, compromising generalizability and temporal coherence. This paper proposes a zero-shot monocular video-to-single-view image rigid motion transfer framework. Our core innovation is the construction of an *internally shared spatiotemporal transformation prior*: we decouple motion and geometric semantics via 3D spatial mapping, enforce spatiotemporal consistency through learnable positional encoding, model controllable velocity fields, and perform position-aware dynamical optimization—all without external supervision. This enables cross-object, zero-shot motion transfer. Experiments demonstrate that our method generates high-fidelity, temporally coherent motion videos across diverse object categories, significantly improving visual consistency and inference efficiency compared to prior approaches.

Technology Category

Application Category

📝 Abstract

We present Motion Marionette, a zero-shot framework for rigid motion transfer from monocular source videos to single-view target images. Previous works typically employ geometric, generative, or simulation priors to guide the transfer process, but these external priors introduce auxiliary constraints that lead to trade-offs between generalizability and temporal consistency. To address these limitations, we propose guiding the motion transfer process through an internal prior that exclusively captures the spatial-temporal transformations and is shared between the source video and any transferred target video. Specifically, we first lift both the source video and the target image into a unified 3D representation space. Motion trajectories are then extracted from the source video to construct a spatial-temporal (SpaT) prior that is independent of object geometry and semantics, encoding relative spatial variations over time. This prior is further integrated with the target object to synthesize a controllable velocity field, which is subsequently refined using Position-Based Dynamics to mitigate artifacts and enhance visual coherence. The resulting velocity field can be flexibly employed for efficient video production. Empirical results demonstrate that Motion Marionette generalizes across diverse objects, produces temporally consistent videos that align well with the source motion, and supports controllable video generation.

Problem

Research questions and friction points this paper is trying to address.

Transferring rigid motion from videos to images without external priors

Overcoming trade-offs between generalizability and temporal consistency

Creating controllable velocity fields for diverse object motion synthesis

Innovation

Methods, ideas, or system contributions that make the work stand out.

Internal prior captures spatial-temporal transformations

Unified 3D representation space for source and target

Controllable velocity field refined with Position-Based Dynamics

🔎 Similar Papers

No similar papers found.

Authors to Follow