Any2Poster: Any-Source Poster Generation Across Modalities and Domains

📅 2026-06-01
📈 Citations: 0
Influential: 0
📄 PDF

career value

155K/year
🤖 AI Summary
This work addresses the lack of a unified cross-modal and cross-domain evaluation framework for automatic poster generation, which often fails to balance informational fidelity and visual effectiveness. To bridge this gap, the authors introduce Any2Poster Bench, a comprehensive benchmark, along with the Any2Poster Agent generation system—the first to support end-to-end poster synthesis from eight input modalities across five content domains. The system integrates heterogeneous source parsing, layout planning, rendering, and an iterative refinement mechanism driven by vision-language model feedback. Experimental results demonstrate that the proposed approach achieves average cross-modal and cross-domain accuracies of 87.25% and 87.28%, respectively, on Any2Poster Bench, and attains an overall accuracy of 72.58% with a density-enhanced score of 145.16 on PaperQuiz, significantly outperforming existing methods.
📝 Abstract
Visual posters are a compact medium for communicating dense information, yet progress on automatic poster generation remains difficult to measure because existing evaluations are often restricted to paper-only inputs, narrow domains, or surface-level visual similarity. We introduce Any2Poster Bench, a benchmark for any-source poster generation that evaluates systems across eight input modalities--PDFs, URLs, PPTX, DOCX, Markdown, LaTeX, notebooks, and videos--and five content domains. Any2Poster Bench pairs each source with quiz-based probes of verbatim factual retention and interpretive understanding, together with VLM-based judgments of visual quality, layout, readability, content completeness, and logical flow, enabling reproducible assessment of both information fidelity and visual communication. To instantiate and validate this benchmark, we further present Any2Poster Agent, an end-to-end reference agent that parses heterogeneous sources, organizes salient content, plans poster layouts, renders posters, and iteratively refines them using visual feedback. On Any2Poster Bench, Any2Poster Agent achieves 87.25% average accuracy across input modalities and 87.28% across content domains. On PaperQuiz-style evaluation, where prior paper-to-poster agents are directly comparable, Any2Poster Agent improves over PosterAgent-4o from 51.06-51.33% to 72.58% overall accuracy and from 116-121 to 145.16 in density-augmented score. Together, Any2Poster Bench and Any2Poster Agent provide a reusable evaluation resource and a competitive baseline for studying multimodal, domain-general poster generation.
Problem

Research questions and friction points this paper is trying to address.

poster generation
multimodal input
cross-domain evaluation
information fidelity
visual communication
Innovation

Methods, ideas, or system contributions that make the work stand out.

any-source poster generation
multimodal benchmark
visual language model (VLM)
end-to-end agent
information fidelity
🔎 Similar Papers
No similar papers found.