🤖 AI Summary
Existing automated survey systems rely on single-shot batch retrieval and static outlines, leading to high noise, fragmented structures, and contextual overload. This paper proposes IterativeSurvey, an iterative survey generation framework that emulates the reading process of human researchers. Methodologically, it integrates multi-stage planning agents, paper-level information extraction, retrieval-generation co-optimization, and multimodal fusion. Its core contributions include: (1) a cyclic outline generation mechanism coupled with paper-card modeling to enable dynamic structural evolution; (2) a visualization-enhanced review-refinement closed loop; and (3) Survey-Arena, a newly constructed paired benchmark for survey evaluation. Experiments demonstrate that IterativeSurvey significantly outperforms state-of-the-art methods on both emerging and established topics, yielding surveys with broader coverage, stronger logical coherence, more precise citations, and quality approaching human-authored standards.
📝 Abstract
Automatic literature survey generation has attracted increasing attention, yet most existing systems follow a one-shot paradigm, where a large set of papers is retrieved at once and a static outline is generated before drafting. This design often leads to noisy retrieval, fragmented structures, and context overload, ultimately limiting survey quality. Inspired by the iterative reading process of human researchers, we propose ours, a framework based on recurrent outline generation, in which a planning agent incrementally retrieves, reads, and updates the outline to ensure both exploration and coherence. To provide faithful paper-level grounding, we design paper cards that distill each paper into its contributions, methods, and findings, and introduce a review-and-refine loop with visualization enhancement to improve textual flow and integrate multimodal elements such as figures and tables. Experiments on both established and emerging topics show that ours substantially outperforms state-of-the-art baselines in content coverage, structural coherence, and citation quality, while producing more accessible and better-organized surveys. To provide a more reliable assessment of such improvements, we further introduce Survey-Arena, a pairwise benchmark that complements absolute scoring and more clearly positions machine-generated surveys relative to human-written ones. The code is available at https://github.com/HancCui/IterSurvey_Autosurveyv2.