🤖 AI Summary
In video synthesis, multi-motion control—such as camera motion, object dragging, and motion brushes—often suffers from signal coupling, leading to logical conflicts, poor controllability, and limited generalization. To address this, we propose I2VControl, the first framework enabling decoupled modeling of motion units with a unified control interface. It decomposes videos into independent motion components and separately encodes multimodal control signals—including text, pose, and optical flow—before fusing them via conditional adapters. Designed as a plug-in module, I2VControl is compatible with diverse pre-trained diffusion models without architectural modification. It supports cross-task and cross-model plug-and-play deployment, allowing flexible composition of arbitrary control conditions and overcoming single-constraint bottlenecks. Experiments demonstrate state-of-the-art performance across multiple motion control tasks, achieving significant improvements in control accuracy, temporal consistency, and generalization to unseen control combinations.
📝 Abstract
Video synthesis techniques are undergoing rapid progress, with controllability being a significant aspect of practical usability for end-users. Although text condition is an effective way to guide video synthesis, capturing the correct joint distribution between text descriptions and video motion remains a substantial challenge. In this paper, we present a disentangled and unified framework, namely I2VControl, that unifies multiple motion control tasks in image-to-video synthesis. Our approach partitions the video into individual motion units and represents each unit with disentangled control signals, which allows for various control types to be flexibly combined within our single system. Furthermore, our methodology seamlessly integrates as a plug-in for pre-trained models and remains agnostic to specific model architectures. We conduct extensive experiments, achieving excellent performance on various control tasks, and our method further facilitates user-driven creative combinations, enhancing innovation and creativity. The project page is: https://wanquanf.github.io/I2VControl .