Any2Poster: Any-Source Poster Generation Across Modalities and Domains

📅 2026-06-01

📈 Citations: 0

✨ Influential: 0

career value

155K/year

🤖 AI Summary

This work addresses the lack of a unified cross-modal and cross-domain evaluation framework for automatic poster generation, which often fails to balance informational fidelity and visual effectiveness. To bridge this gap, the authors introduce Any2Poster Bench, a comprehensive benchmark, along with the Any2Poster Agent generation system—the first to support end-to-end poster synthesis from eight input modalities across five content domains. The system integrates heterogeneous source parsing, layout planning, rendering, and an iterative refinement mechanism driven by vision-language model feedback. Experimental results demonstrate that the proposed approach achieves average cross-modal and cross-domain accuracies of 87.25% and 87.28%, respectively, on Any2Poster Bench, and attains an overall accuracy of 72.58% with a density-enhanced score of 145.16 on PaperQuiz, significantly outperforming existing methods.

📝 Abstract

Visual posters are a compact medium for communicating dense information, yet progress on automatic poster generation remains difficult to measure because existing evaluations are often restricted to paper-only inputs, narrow domains, or surface-level visual similarity. We introduce Any2Poster Bench, a benchmark for any-source poster generation that evaluates systems across eight input modalities--PDFs, URLs, PPTX, DOCX, Markdown, LaTeX, notebooks, and videos--and five content domains. Any2Poster Bench pairs each source with quiz-based probes of verbatim factual retention and interpretive understanding, together with VLM-based judgments of visual quality, layout, readability, content completeness, and logical flow, enabling reproducible assessment of both information fidelity and visual communication. To instantiate and validate this benchmark, we further present Any2Poster Agent, an end-to-end reference agent that parses heterogeneous sources, organizes salient content, plans poster layouts, renders posters, and iteratively refines them using visual feedback. On Any2Poster Bench, Any2Poster Agent achieves 87.25% average accuracy across input modalities and 87.28% across content domains. On PaperQuiz-style evaluation, where prior paper-to-poster agents are directly comparable, Any2Poster Agent improves over PosterAgent-4o from 51.06-51.33% to 72.58% overall accuracy and from 116-121 to 145.16 in density-augmented score. Together, Any2Poster Bench and Any2Poster Agent provide a reusable evaluation resource and a competitive baseline for studying multimodal, domain-general poster generation.

Problem

Research questions and friction points this paper is trying to address.

poster generation

multimodal input

cross-domain evaluation

information fidelity

visual communication

Innovation

Methods, ideas, or system contributions that make the work stand out.

any-source poster generation

multimodal benchmark

visual language model (VLM)

end-to-end agent

information fidelity

🔎 Similar Papers

No similar papers found.