🤖 AI Summary
To address character inconsistency, audio-visual desynchronization, and poor scalability in long-form AI-generated animation, this paper proposes an adaptive multi-agent planning system for end-to-end animation generation. The system centers on a director agent that orchestrates specialized agents—including script, character, speech, storyboard, and compositing agents—unified by a global memory mechanism for cross-stage state sharing. We introduce the Model Context Protocol (MCP), enabling agents to dynamically perceive task context and adaptively select control constraints, thereby significantly improving character consistency and audio-visual alignment accuracy. Integrated with instruction orchestration, the system achieves fully automated, collaborative generation from textual narratives to high-fidelity video. Experiments demonstrate stable production of extended-duration animations with coherent character performances and precise audio-visual synchronization. This work establishes the first scalable, modular, and highly controllable multi-agent framework for AI-native animation production.
📝 Abstract
We present AniME, a director-oriented multi-agent system for automated long-form anime production, covering the full workflow from a story to the final video. The director agent keeps a global memory for the whole workflow, and coordinates several downstream specialized agents. By integrating customized Model Context Protocol (MCP) with downstream model instruction, the specialized agent adaptively selects control conditions for diverse sub-tasks. AniME produces cinematic animation with consistent characters and synchronized audio visual elements, offering a scalable solution for AI-driven anime creation.