OmniDirector: General Multi-Shot Camera Cloning without Cross-Paired Data

πŸ“… 2026-06-11
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Existing methods for cloning complex multi-shot camera motions are limited by insufficient representational capacity or reliance on scarce cross-paired data. This work proposes β€œCamera Grid Video,” a general-purpose representation for camera motion, and introduces OmniDirector, a unified framework trained on millions of unpaired camera grid–video samples. Built upon a multimodal diffusion Transformer architecture, OmniDirector integrates the novel camera grid representation with a hierarchical prompt augmentation mechanism to enable director-level coordinated control over characters, actions, and camera movements. Experiments demonstrate that the proposed approach significantly outperforms current state-of-the-art methods in complex camera motion cloning, achieving both high fidelity and strong controllability.
πŸ“ Abstract
Cloning camera motion from reference videos is an important task in video generation, as videos provide intuitive and precise control. Existing methods either directly use parametric representations that fail to handle multi-shot generation or synthesize cross-paired data, which suffer from data scarcity, resulting in poor performance in complicated camera motion cloning. To address these issues, we introduce a general camera motion representation that encodes cameras as grid motion videos. This camera grid represents the camera parameters visually and supports the integration of diverse trajectories for multi-shot video generation. Building upon this, we propose OmniDirector, a unified framework trained on a million-scale camera grid-video pairs that coordinates characters, actions, and cameras to provide director-level control for multimodal diffusion transformers. Furthermore, we design a novel hierarchical prompt expansion agent that harmoniously integrates different control signals by systematically describing camera motion and visual content through understanding signal relationships. Extensive experiments demonstrate the superior performance and outstanding controllability of our framework. Project page: https://ymlinfeng.github.io/OmniDirector.github.io/
Problem

Research questions and friction points this paper is trying to address.

camera motion cloning
multi-shot video generation
cross-paired data
video generation
camera control
Innovation

Methods, ideas, or system contributions that make the work stand out.

camera motion cloning
multi-shot video generation
camera grid representation
multimodal diffusion transformer
hierarchical prompt expansion
πŸ”Ž Similar Papers
No similar papers found.