Presenting a Paper is an Art: Self-Improvement Aesthetic Agents for Academic Presentations

📅 2025-10-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing automated academic presentation systems suffer from narrative discontinuity, suboptimal visual aesthetics, and absence of self-improvement capabilities. This paper introduces PresAesth—the first multi-task reinforcement learning framework for aesthetic modeling in academic presentations—capable of aesthetic scoring, defect correction, and iterative self-refinement driven by comparative feedback, even under data scarcity. We propose EvoPresent Benchmark, the first evaluation benchmark jointly quantifying content quality and aesthetic perception. Furthermore, we design an end-to-end slide generation pipeline integrating narrative generation, virtual presenter embodiment, and closed-loop aesthetic optimization. Experiments demonstrate that high-quality comparative feedback significantly enhances presentation quality, uncovering fundamental trade-offs between content fidelity and visual design. Multi-task RL exhibits superior generalizability in aesthetic modeling compared to single-task alternatives.

Technology Category

Application Category

📝 Abstract
The promotion of academic papers has become an important means of enhancing research visibility. However, existing automated methods struggle limited storytelling, insufficient aesthetic quality, and constrained self-adjustment, making it difficult to achieve efficient and engaging dissemination. At the heart of those challenges is a simple principle: emph{there is no way to improve it when you cannot evaluate it right}. To address this, we introduce extbf{EvoPresent}, a self-improvement agent framework that unifies coherent narratives, aesthetic-aware designs, and realistic presentation delivery via virtual characters. Central to EvoPresent is extbf{PresAesth}, a multi-task reinforcement learning (RL) aesthetic model that provides reliable aesthetic scoring, defect adjustment, and comparative feedback, enabling iterative self-improvement even under limited aesthetic training data. To systematically evaluate the methods, we introduce extbf{EvoPresent Benchmark}, a comprehensive benchmark comprising: extit{Presentation Generation Quality}, built on 650 top-tier AI conference papers with multimodal resources (slides, videos and scripts) to assess both content and design; and extit{Aesthetic Awareness}, consisting of 2,000 slide pairs with varying aesthetic levels, supporting joint training and evaluation on scoring, defect adjustment, and comparison. Our findings highlight that (i) High-quality feedback is essential for agent self-improvement, while initial capability alone does not guarantee effective self-correction. (ii) Automated generation pipelines exhibit a trade-off between visual design and content construction. (iii) Multi-task RL training shows stronger generalization in aesthetic awareness tasks.
Problem

Research questions and friction points this paper is trying to address.

Automated academic presentation methods lack storytelling and aesthetic quality
Existing systems cannot self-adjust without reliable aesthetic evaluation metrics
Current approaches struggle with visual design and content construction balance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-improvement agent framework for academic presentations
Multi-task reinforcement learning aesthetic model PresAesth
Comprehensive benchmark with multimodal resources for evaluation
🔎 Similar Papers
No similar papers found.
Chengzhi Liu
Chengzhi Liu
PhD, UC Santa Barbara
Vison Language ModelTruthworthy AIReasoning
Y
Yuzhe Yang
University of California, Santa Barbara
K
Kaiwen Zhou
University of California, Santa Cruz
Z
Zhen Zhang
University of California, Santa Barbara
Y
Yue Fan
University of California, Santa Cruz
Yannan Xie
Yannan Xie
Uniphore
P
Peng Qi
Uniphore
Xin Eric Wang
Xin Eric Wang
Assistant Professor, University of California, Santa Barbara, Simular
NLPCVMLLanguage and VisionAI Agents