Automated Movie Generation via Multi-Agent CoT Planning

📅 2025-03-10

📈 Citations: 0

✨ Influential: 0

career value

168K/year

🤖 AI Summary

Current long-video generation frameworks rely heavily on manually authored scripts, storyboards, and cinematographic design—resulting in high labor costs and low efficiency. To address this, we propose the first end-to-end automated long-form cinematic video generation paradigm, which synthesizes coherent, multi-scene, multi-shot videos with synchronized subtitles and stable audio solely from a text script and a character library. Methodologically, we introduce a hierarchical Chain-of-Thought multi-agent collaborative planning framework that emulates professional roles (e.g., director, screenwriter) for cross-stage reasoning; it integrates large language model–driven character-consistency modeling with cross-modal synchronized generation. Experiments demonstrate new state-of-the-art performance in script fidelity, character consistency, and narrative coherence, while substantially reducing human intervention—enabling truly autonomous cinematic video synthesis.

Technology Category

Application Category

📝 Abstract

Existing long-form video generation frameworks lack automated planning, requiring manual input for storylines, scenes, cinematography, and character interactions, resulting in high costs and inefficiencies. To address these challenges, we present MovieAgent, an automated movie generation via multi-agent Chain of Thought (CoT) planning. MovieAgent offers two key advantages: 1) We firstly explore and define the paradigm of automated movie/long-video generation. Given a script and character bank, our MovieAgent can generates multi-scene, multi-shot long-form videos with a coherent narrative, while ensuring character consistency, synchronized subtitles, and stable audio throughout the film. 2) MovieAgent introduces a hierarchical CoT-based reasoning process to automatically structure scenes, camera settings, and cinematography, significantly reducing human effort. By employing multiple LLM agents to simulate the roles of a director, screenwriter, storyboard artist, and location manager, MovieAgent streamlines the production pipeline. Experiments demonstrate that MovieAgent achieves new state-of-the-art results in script faithfulness, character consistency, and narrative coherence. Our hierarchical framework takes a step forward and provides new insights into fully automated movie generation. The code and project website are available at: https://github.com/showlab/MovieAgent and https://weijiawu.github.io/MovieAgent.

Problem

Research questions and friction points this paper is trying to address.

Automated planning for long-form video generation lacking efficiency.

Ensuring narrative coherence and character consistency in automated movie creation.

Reducing human effort in structuring scenes and cinematography automatically.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Automated movie generation via multi-agent CoT planning

Hierarchical CoT-based reasoning for scene structuring

Multiple LLM agents simulate film production roles

🔎 Similar Papers

Kubrick: Multimodal Agent Collaborations for Synthetic Video Generation