Talk2Image: A Multi-Agent System for Multi-Turn Image Generation and Editing

📅 2025-08-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing text-to-image generation methods primarily target single-turn tasks, lacking support for multi-turn iterative creative editing and suffering from intent drift and editing discontinuity. To address this, we propose the first multi-agent collaborative framework for multi-turn text-guided image generation and editing. Our approach decomposes complex editing tasks, assigns specialized roles (e.g., Intent Analyst, Edit Executor, Consistency Evaluator), and employs a multi-perspective feedback mechanism to ensure progressive alignment with user intent and continuous image refinement. It integrates dialogue-history-aware intent modeling, structured task orchestration, and tri-dimensional evaluation—spanning semantic, visual, and temporal coherence. Experiments demonstrate that our system significantly outperforms state-of-the-art single-agent conversational methods in editing controllability, cross-turn consistency, and user satisfaction. This work establishes a scalable, cooperative paradigm for interactive AI-generated content.

Technology Category

Application Category

📝 Abstract
Text-to-image generation tasks have driven remarkable advances in diverse media applications, yet most focus on single-turn scenarios and struggle with iterative, multi-turn creative tasks. Recent dialogue-based systems attempt to bridge this gap, but their single-agent, sequential paradigm often causes intention drift and incoherent edits. To address these limitations, we present Talk2Image, a novel multi-agent system for interactive image generation and editing in multi-turn dialogue scenarios. Our approach integrates three key components: intention parsing from dialogue history, task decomposition and collaborative execution across specialized agents, and feedback-driven refinement based on a multi-view evaluation mechanism. Talk2Image enables step-by-step alignment with user intention and consistent image editing. Experiments demonstrate that Talk2Image outperforms existing baselines in controllability, coherence, and user satisfaction across iterative image generation and editing tasks.
Problem

Research questions and friction points this paper is trying to address.

Addresses multi-turn image generation and editing challenges
Solves intention drift and incoherent edits in dialogues
Enables iterative creative tasks with user intention alignment
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-agent system for interactive image tasks
Intention parsing from dialogue history
Multi-view evaluation driven refinement
🔎 Similar Papers
No similar papers found.
S
Shichao Ma
University of Science and Technology of China
Y
Yunhe Guo
University of Science and Technology of China
J
Jiahao Su
University of Science and Technology of China
Q
Qihe Huang
University of Science and Technology of China
Zhengyang Zhou
Zhengyang Zhou
University of Science and Technology of China
spatiotemporal data miningmachine learningdeep learningurban computing
Y
Yang Wang
University of Science and Technology of China