Multi-Agent Debate Strategies to Enhance Requirements Engineering with Large Language Models

📅 2025-07-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) exhibit bias and insufficient robustness when applied to requirements engineering (RE) tasks, particularly due to reliance on single-model outputs. Method: This paper proposes a Multi-Agent Debate (MAD) strategy—constructing a collaborative LLM agent framework that integrates prompt engineering and cooperative reasoning to iteratively refine requirements understanding, classification, and validation from multiple perspectives. Contribution/Results: We introduce the first taxonomy of MAD strategies specifically designed for RE, overcoming the limitations of traditional single-model black-box approaches. Experimental evaluation demonstrates significant improvements in requirements classification accuracy and adaptability, reduced individual model bias, and enhanced capability in handling complex and ambiguous requirements. This work establishes a reusable methodological foundation and empirical evidence for leveraging LLMs in RE.

Technology Category

Application Category

📝 Abstract
Context: Large Language Model (LLM) agents are becoming widely used for various Requirements Engineering (RE) tasks. Research on improving their accuracy mainly focuses on prompt engineering, model fine-tuning, and retrieval augmented generation. However, these methods often treat models as isolated black boxes - relying on single-pass outputs without iterative refinement or collaboration, limiting robustness and adaptability. Objective: We propose that, just as human debates enhance accuracy and reduce bias in RE tasks by incorporating diverse perspectives, different LLM agents debating and collaborating may achieve similar improvements. Our goal is to investigate whether Multi-Agent Debate (MAD) strategies can enhance RE performance. Method: We conducted a systematic study of existing MAD strategies across various domains to identify their key characteristics. To assess their applicability in RE, we implemented and tested a preliminary MAD-based framework for RE classification. Results: Our study identified and categorized several MAD strategies, leading to a taxonomy outlining their core attributes. Our preliminary evaluation demonstrated the feasibility of applying MAD to RE classification. Conclusions: MAD presents a promising approach for improving LLM accuracy in RE tasks. This study provides a foundational understanding of MAD strategies, offering insights for future research and refinements in RE applications.
Problem

Research questions and friction points this paper is trying to address.

Enhancing Requirements Engineering accuracy with multi-agent LLM debates
Investigating collaborative debate strategies for LLM robustness in RE
Developing a taxonomy for multi-agent debate approaches in RE
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-Agent Debate enhances LLM collaboration
Diverse perspectives improve RE task accuracy
Taxonomy of MAD strategies for RE applications
🔎 Similar Papers
No similar papers found.
M
Marc Oriol
Universitat Politècnica de Catalunya, Barcelona, Spain
Q
Quim Motger
Universitat Politècnica de Catalunya, Barcelona, Spain
Jordi Marco
Jordi Marco
Associate Professor, Universitat Politècnica de Catalunya
Service Oriented ComputingNon-Functional RequirementsSoftware EngineeringComputer Graphics
X
Xavier Franch
Universitat Politècnica de Catalunya, Barcelona, Spain