🤖 AI Summary
Existing document-to-presentation generation methods largely neglect visual design principles and structural coherence, limiting their practical utility. To address this, we propose an end-to-end, two-stage editing-based generation framework: Stage I leverages large language models to learn document structural patterns and infer design constraints; Stage II employs code-driven atomic editing actions to jointly optimize content selection, layout arrangement, and stylistic consistency across slides. Furthermore, we introduce PPTEval—the first three-dimensional evaluation benchmark covering content accuracy, visual design合理性, and cross-slide coherence—equipped with interpretable, multi-faceted metrics. Extensive experiments demonstrate that our method significantly outperforms state-of-the-art baselines across all three dimensions. The source code, dataset, and evaluation toolkit are fully open-sourced.
📝 Abstract
Automatically generating presentations from documents is a challenging task that requires balancing content quality, visual design, and structural coherence. Existing methods primarily focus on improving and evaluating the content quality in isolation, often overlooking visual design and structural coherence, which limits their practical applicability. To address these limitations, we propose PPTAgent, which comprehensively improves presentation generation through a two-stage, edit-based approach inspired by human workflows. PPTAgent first analyzes reference presentations to understand their structural patterns and content schemas, then drafts outlines and generates slides through code actions to ensure consistency and alignment. To comprehensively evaluate the quality of generated presentations, we further introduce PPTEval, an evaluation framework that assesses presentations across three dimensions: Content, Design, and Coherence. Experiments show that PPTAgent significantly outperforms traditional automatic presentation generation methods across all three dimensions. The code and data are available at https://github.com/icip-cas/PPTAgent.