🤖 AI Summary
Existing video generation methods suffer from insufficient precision in camera motion control and neglect explicit modeling of subject motion dynamics, failing to meet professional-grade controllability requirements. To address this, we propose a high-precision, disentangled framework for joint camera and subject control. Our approach introduces 3D point trajectories in the camera coordinate system as control signals, explicitly modeling high-order motion dynamics—including acceleration and jerk—and incorporates an adjustable motion scaling operator. We adopt a lightweight, base-model-agnostic Adapter-based fine-tuning architecture. Extensive experiments demonstrate that our method significantly outperforms state-of-the-art approaches on both static and dynamic scenes. Quantitative evaluations show consistent improvements across key metrics (e.g., CAM-PSNR, Motion-FID), while qualitative results exhibit markedly more accurate camera choreography and natural, fine-grained controllability over subject motion.
📝 Abstract
Video generation technologies are developing rapidly and have broad potential applications. Among these technologies, camera control is crucial for generating professional-quality videos that accurately meet user expectations. However, existing camera control methods still suffer from several limitations, including control precision and the neglect of the control for subject motion dynamics. In this work, we propose I2VControl-Camera, a novel camera control method that significantly enhances controllability while providing adjustability over the strength of subject motion. To improve control precision, we employ point trajectory in the camera coordinate system instead of only extrinsic matrix information as our control signal. To accurately control and adjust the strength of subject motion, we explicitly model the higher-order components of the video trajectory expansion, not merely the linear terms, and design an operator that effectively represents the motion strength. We use an adapter architecture that is independent of the base model structure. Experiments on static and dynamic scenes show that our framework outperformances previous methods both quantitatively and qualitatively. The project page is: https://wanquanf.github.io/I2VControlCamera .