🤖 AI Summary
This work proposes FlowMotion, a training-free and efficient framework for video motion transfer that circumvents the high computational cost and limited flexibility of existing methods relying on intermediate features from pretrained text-to-video models. FlowMotion is the first to identify and exploit the rich temporal information embedded in early-stage latent predictions of diffusion models, directly extracting motion representations guided by optical flow to align motion between source and generated videos. By incorporating a velocity regularization strategy to enhance motion smoothness, FlowMotion achieves generation quality comparable to state-of-the-art approaches while significantly reducing computational overhead, thereby improving both efficiency and adaptability.
📝 Abstract
Video motion transfer aims to generate a target video that inherits motion patterns from a source video while rendering new scenes. Existing training-free approaches focus on constructing motion guidance based on the intermediate outputs of pre-trained T2V models, which results in heavy computational overhead and limited flexibility. In this paper, we present FlowMotion, a novel training-free framework that enables efficient and flexible motion transfer by directly leveraging the predicted outputs of flow-based T2V models. Our key insight is that early latent predictions inherently encode rich temporal information. Motivated by this, we propose flow guidance, which extracts motion representations based on latent predictions to align motion patterns between source and generated videos. We further introduce a velocity regularization strategy to stabilize optimization and ensure smooth motion evolution. By operating purely on model predictions, FlowMotion achieves superior time and resource efficiency as well as competitive performance compared with state-of-the-art methods.