Preacher: Paper-to-Video Agentic System

📅 2025-08-13
📈 Citations: 0
Influential: 0
📄 PDF

career value

168K/year
🤖 AI Summary
Current video generation models face critical limitations—including narrow context windows, restricted output duration, monolithic stylistic outputs, and imprecise domain-knowledge representation—when converting academic papers into structured video summaries. To address these challenges, we propose the first agent-based system specifically designed for the “paper-to-video” task, adopting a two-stage paradigm that synergistically integrates top-down content decomposition with bottom-up clip generation. Our approach innovatively introduces key-scene definition and Progressive Chain-of-Thought (P-CoT) reasoning to enable fine-grained cross-modal alignment and accurate domain-specific knowledge modeling. The system unifies large language model–driven reasoning, multi-granularity summarization, controllable video generation, and compositional synthesis, supporting end-to-end task planning and content orchestration. Evaluated across five academic disciplines, our generated video summaries demonstrate statistically significant improvements over baselines in domain expertise, narrative coherence, and stylistic diversity.

Technology Category

Application Category

📝 Abstract
The paper-to-video task converts a research paper into a structured video abstract, distilling key concepts, methods, and conclusions into an accessible, well-organized format. While state-of-the-art video generation models demonstrate potential, they are constrained by limited context windows, rigid video duration constraints, limited stylistic diversity, and an inability to represent domain-specific knowledge. To address these limitations, we introduce Preacher, the first paper-to-video agentic system. Preacher employs a top- down approach to decompose, summarize, and reformulate the paper, followed by bottom-up video generation, syn- thesizing diverse video segments into a coherent abstract. To align cross-modal representations, we define key scenes and introduce a Progressive Chain of Thought (P-CoT) for granular, iterative planning. Preacher successfully gener- ates high-quality video abstracts across five research fields, demonstrating expertise beyond current video generation models. Code will be released at: https://github.com/Gen- Verse/Paper2Video
Problem

Research questions and friction points this paper is trying to address.

Converts research papers into structured video abstracts
Overcomes limitations of current video generation models
Aligns cross-modal representations for coherent video synthesis
Innovation

Methods, ideas, or system contributions that make the work stand out.

Top-down paper decomposition and summarization
Bottom-up diverse video segment synthesis
Progressive Chain of Thought for planning
🔎 Similar Papers