๐ค AI Summary
Existing video generation methods rely on fine-tuning, depth estimation, or inpainting for camera motion modeling, resulting in high computational overhead and motion inconsistency. This paper proposes LightMotion: a lightweight, fine-tuning-free latent-space camera motion simulation method. It introduces explicit permutation operations to model translation, scaling, and rotation, and designs background-aware cross-frame alignment resampling to preserve content consistency. We identify an SNR shift induced by motion in the latent space and propose a noise re-injection mechanism for latent-space correction. LightMotion enables efficient inference solely via implicit control of pretrained diffusion modelsโwithout auxiliary modules. Quantitatively, it achieves state-of-the-art performance on FID, FVD, and perceptual quality metrics. Qualitatively, it significantly improves motion coherence and computational efficiency while eliminating dependencies on external components.
๐ Abstract
Existing camera motion-controlled video generation methods face computational bottlenecks in fine-tuning and inference. This paper proposes LightMotion, a light and tuning-free method for simulating camera motion in video generation. Operating in the latent space, it eliminates additional fine-tuning, inpainting, and depth estimation, making it more streamlined than existing methods. The endeavors of this paper comprise: (i) The latent space permutation operation effectively simulates various camera motions like panning, zooming, and rotation. (ii) The latent space resampling strategy combines background-aware sampling and cross-frame alignment to accurately fill new perspectives while maintaining coherence across frames. (iii) Our in-depth analysis shows that the permutation and resampling cause an SNR shift in latent space, leading to poor-quality generation. To address this, we propose latent space correction, which reintroduces noise during denoising to mitigate SNR shift and enhance video generation quality. Exhaustive experiments show that our LightMotion outperforms existing methods, both quantitatively and qualitatively.