Optimizing LLM-Based Multi-Agent System with Textual Feedback: A Case Study on Software Development

📅 2025-05-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current LLM-driven role-based multi-agent systems exhibit insufficient collaboration performance and weak interpretability in complex tasks such as software development. To address this, we propose a two-stage prompt optimization framework guided by natural language feedback: first, failure attribution analysis identifies underperforming agents and uncovers root causes; second, system-level prompt reconstruction is performed. This work pioneers the integration of attribution analysis with hierarchical prompt optimization and presents the first systematic empirical comparison of three orthogonal optimization paradigms—online vs. offline, individual vs. group, and single-turn vs. multi-turn—in shaping multi-agent collaborative behavior. Experiments demonstrate significant improvements across critical dimensions including code generation, requirements understanding, and debugging/repair. Our framework establishes a reproducible and scalable empirical paradigm for enhancing interpretability and collaboration efficacy in multi-agent systems tackling complex tasks.

Technology Category

Application Category

📝 Abstract
We have seen remarkable progress in large language models (LLMs) empowered multi-agent systems solving complex tasks necessitating cooperation among experts with diverse skills. However, optimizing LLM-based multi-agent systems remains challenging. In this work, we perform an empirical case study on group optimization of role-based multi-agent systems utilizing natural language feedback for challenging software development tasks under various evaluation dimensions. We propose a two-step agent prompts optimization pipeline: identifying underperforming agents with their failure explanations utilizing textual feedback and then optimizing system prompts of identified agents utilizing failure explanations. We then study the impact of various optimization settings on system performance with two comparison groups: online against offline optimization and individual against group optimization. For group optimization, we study two prompting strategies: one-pass and multi-pass prompting optimizations. Overall, we demonstrate the effectiveness of our optimization method for role-based multi-agent systems tackling software development tasks evaluated on diverse evaluation dimensions, and we investigate the impact of diverse optimization settings on group behaviors of the multi-agent systems to provide practical insights for future development.
Problem

Research questions and friction points this paper is trying to address.

Optimizing LLM-based multi-agent systems using textual feedback
Improving software development tasks through agent prompt optimization
Studying optimization settings' impact on multi-agent system performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Two-step agent prompts optimization pipeline
Utilizing textual feedback for failure explanations
Comparing online vs offline optimization settings
🔎 Similar Papers
No similar papers found.