Preacher: Paper-to-Video Agentic System

📅 2025-08-13

📈 Citations: 0

✨ Influential: 0

career value

168K/year

🤖 AI Summary

Current video generation models face critical limitations—including narrow context windows, restricted output duration, monolithic stylistic outputs, and imprecise domain-knowledge representation—when converting academic papers into structured video summaries. To address these challenges, we propose the first agent-based system specifically designed for the “paper-to-video” task, adopting a two-stage paradigm that synergistically integrates top-down content decomposition with bottom-up clip generation. Our approach innovatively introduces key-scene definition and Progressive Chain-of-Thought (P-CoT) reasoning to enable fine-grained cross-modal alignment and accurate domain-specific knowledge modeling. The system unifies large language model–driven reasoning, multi-granularity summarization, controllable video generation, and compositional synthesis, supporting end-to-end task planning and content orchestration. Evaluated across five academic disciplines, our generated video summaries demonstrate statistically significant improvements over baselines in domain expertise, narrative coherence, and stylistic diversity.

Technology Category

Application Category

📝 Abstract

The paper-to-video task converts a research paper into a structured video abstract, distilling key concepts, methods, and conclusions into an accessible, well-organized format. While state-of-the-art video generation models demonstrate potential, they are constrained by limited context windows, rigid video duration constraints, limited stylistic diversity, and an inability to represent domain-specific knowledge. To address these limitations, we introduce Preacher, the first paper-to-video agentic system. Preacher employs a top- down approach to decompose, summarize, and reformulate the paper, followed by bottom-up video generation, syn- thesizing diverse video segments into a coherent abstract. To align cross-modal representations, we define key scenes and introduce a Progressive Chain of Thought (P-CoT) for granular, iterative planning. Preacher successfully gener- ates high-quality video abstracts across five research fields, demonstrating expertise beyond current video generation models. Code will be released at: https://github.com/Gen- Verse/Paper2Video

Problem

Research questions and friction points this paper is trying to address.

Converts research papers into structured video abstracts

Overcomes limitations of current video generation models

Aligns cross-modal representations for coherent video synthesis

Innovation

Methods, ideas, or system contributions that make the work stand out.

Top-down paper decomposition and summarization

Bottom-up diverse video segment synthesis

Progressive Chain of Thought for planning

🔎 Similar Papers

Chrono: A Simple Blueprint for Representing Time in MLLMs