π€ AI Summary
This work proposes the first end-to-end, multi-agent collaborative framework for automated previsualization in filmmaking, addressing the inefficiency of creative-to-visual translation and heavy reliance on manual coordination. The framework emulates the decision-making workflow of film production teams by integrating a multimodal agent collaboration mechanism that combines text-to-3D scene generation, character behavior control, and shot planning algorithms. Real-time visualization is achieved through a game engine, enabling the system to generate semantically consistent and visually coherent high-quality previsualization sequences in approximately 25 minutes. Human evaluations confirm the frameworkβs effectiveness in automating prototype generation and facilitating human-AI collaborative creativity.
π Abstract
We present Mind-of-Director, a multi-modal agent-driven framework for film previz that models the collaborative decision-making process of a film production team. Given a creative idea, Mind-of-Director orchestrates multiple specialized agents to produce previz sequences within the game engine. The framework consists of four cooperative modules: Script Development, where agents draft and refine the screenplay iteratively; Virtual Scene Design, which transforms text into semantically aligned 3D environments; Character Behaviour Control, which determines character blocking and motion; and Camera Planning, which optimizes framing, movement, and composition for cinematic camera effects. A real-time visual editing system built in the game engine further enables interactive inspection and synchronized timeline adjustment across scenes, behaviours, and cameras. Extensive experiments and human evaluations show that Mind-of-Director generates high-quality, semantically grounded previz sequences in approximately 25 minutes per idea, demonstrating the effectiveness of agent collaboration for both automated prototyping and human-in-the-loop filmmaking.