I2VControl: Disentangled and Unified Video Motion Synthesis Control

📅 2024-11-26

🏛️ arXiv.org

📈 Citations: 7

✨ Influential: 0

career value

176K/year

🤖 AI Summary

In video synthesis, multi-motion control—such as camera motion, object dragging, and motion brushes—often suffers from signal coupling, leading to logical conflicts, poor controllability, and limited generalization. To address this, we propose I2VControl, the first framework enabling decoupled modeling of motion units with a unified control interface. It decomposes videos into independent motion components and separately encodes multimodal control signals—including text, pose, and optical flow—before fusing them via conditional adapters. Designed as a plug-in module, I2VControl is compatible with diverse pre-trained diffusion models without architectural modification. It supports cross-task and cross-model plug-and-play deployment, allowing flexible composition of arbitrary control conditions and overcoming single-constraint bottlenecks. Experiments demonstrate state-of-the-art performance across multiple motion control tasks, achieving significant improvements in control accuracy, temporal consistency, and generalization to unseen control combinations.

Technology Category

Application Category

📝 Abstract

Video synthesis techniques are undergoing rapid progress, with controllability being a significant aspect of practical usability for end-users. Although text condition is an effective way to guide video synthesis, capturing the correct joint distribution between text descriptions and video motion remains a substantial challenge. In this paper, we present a disentangled and unified framework, namely I2VControl, that unifies multiple motion control tasks in image-to-video synthesis. Our approach partitions the video into individual motion units and represents each unit with disentangled control signals, which allows for various control types to be flexibly combined within our single system. Furthermore, our methodology seamlessly integrates as a plug-in for pre-trained models and remains agnostic to specific model architectures. We conduct extensive experiments, achieving excellent performance on various control tasks, and our method further facilitates user-driven creative combinations, enhancing innovation and creativity. The project page is: https://wanquanf.github.io/I2VControl .

Problem

Research questions and friction points this paper is trying to address.

Overcoming logical conflicts in diverse video motion control

Unifying camera, object, and motion controls via point trajectories

Enabling dynamic orchestration of control types without conflicts

Innovation

Methods, ideas, or system contributions that make the work stand out.

Disentangled unified framework for video motion control

Spatial partitioning strategy for conflict-free synthesis

Adapter structure for pre-trained model compatibility

🔎 Similar Papers

No similar papers found.