GenDoP: Auto-regressive Camera Trajectory Generation as a Director of Photography

📅 2025-04-09

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Existing camera trajectory generation methods suffer from structural bias, insufficient text alignment, and limited creative expressiveness. This paper proposes a text- and RGBD-guided autoregressive cinematographic motion planning framework tailored for cinematic visual storytelling. We introduce DataDoP—the first multimodal cinematography dataset (29K real-world shots) featuring fine-grained motion semantics annotations. We further design the first decoder-only Transformer architecture with explicit multimodal alignment, enabling joint text-conditioned and RGBD-aware trajectory modeling. Our approach significantly enhances trajectory artistic quality, user controllability, and motion stability. Comprehensive qualitative and quantitative evaluations demonstrate consistent superiority over both geometry-based optimization methods and state-of-the-art learning-based approaches.

Technology Category

Application Category

📝 Abstract

Camera trajectory design plays a crucial role in video production, serving as a fundamental tool for conveying directorial intent and enhancing visual storytelling. In cinematography, Directors of Photography meticulously craft camera movements to achieve expressive and intentional framing. However, existing methods for camera trajectory generation remain limited: Traditional approaches rely on geometric optimization or handcrafted procedural systems, while recent learning-based methods often inherit structural biases or lack textual alignment, constraining creative synthesis. In this work, we introduce an auto-regressive model inspired by the expertise of Directors of Photography to generate artistic and expressive camera trajectories. We first introduce DataDoP, a large-scale multi-modal dataset containing 29K real-world shots with free-moving camera trajectories, depth maps, and detailed captions in specific movements, interaction with the scene, and directorial intent. Thanks to the comprehensive and diverse database, we further train an auto-regressive, decoder-only Transformer for high-quality, context-aware camera movement generation based on text guidance and RGBD inputs, named GenDoP. Extensive experiments demonstrate that compared to existing methods, GenDoP offers better controllability, finer-grained trajectory adjustments, and higher motion stability. We believe our approach establishes a new standard for learning-based cinematography, paving the way for future advancements in camera control and filmmaking. Our project website: https://kszpxxzmc.github.io/GenDoP/.

Problem

Research questions and friction points this paper is trying to address.

Generating artistic camera trajectories for video production

Overcoming limitations in existing camera trajectory methods

Enhancing controllability and stability in camera movement generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Auto-regressive model for camera trajectory generation

Large-scale multi-modal dataset DataDoP

Decoder-only Transformer for text-guided RGBD inputs

🔎 Similar Papers

No similar papers found.

Authors to Follow