Courtroom-Style Multi-Agent Debate with Progressive RAG and Role-Switching for Controversial Claim Verification

📅 2026-03-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the susceptibility of large language models to hallucination and shallow reasoning in high-stakes fact-checking. To overcome limitations of single-pass retrieval and unstructured multi-agent debate, the authors propose PROClaim, a courtroom-inspired multi-agent framework that reframes claim verification as structured adversarial deliberation. PROClaim dynamically refines its evidence pool through role-specialized agents, progressive retrieval-augmented generation (P-RAG), evidence negotiation, and aggregation by heterogeneous judge agents, thereby enhancing calibration, robustness, and response diversity. Evaluated zero-shot on the Check-COVID benchmark, PROClaim achieves an accuracy of 81.7%, representing a 10.0 percentage point improvement over standard multi-agent debate, with P-RAG alone contributing a 7.5 percentage point gain.
📝 Abstract
Large language models (LLMs) remain unreliable for high-stakes claim verification due to hallucinations and shallow reasoning. While retrieval-augmented generation (RAG) and multi-agent debate (MAD) address this, they are limited by one-pass retrieval and unstructured debate dynamics. We propose a courtroom-style multi-agent framework, PROClaim, that reformulates verification as a structured, adversarial deliberation. Our approach integrates specialized roles (e.g., Plaintiff, Defense, Judge) with Progressive RAG (P-RAG) to dynamically expand and refine the evidence pool during the debate. Furthermore, we employ evidence negotiation, self-reflection, and heterogeneous multi-judge aggregation to enforce calibration, robustness, and diversity. In zero-shot evaluations on the Check-COVID benchmark, PROClaim achieves 81.7% accuracy, outperforming standard multi-agent debate by 10.0 percentage points, with P-RAG driving the primary performance gains (+7.5 pp). We ultimately demonstrate that structural deliberation and model heterogeneity effectively mitigate systematic biases, providing a robust foundation for reliable claim verification. Our code and data are publicly available at https://github.com/mnc13/PROClaim.
Problem

Research questions and friction points this paper is trying to address.

claim verification
hallucination
retrieval-augmented generation
multi-agent debate
reasoning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Progressive RAG
Multi-Agent Debate
Courtroom-Style Deliberation
Evidence Negotiation
Model Heterogeneity
🔎 Similar Papers
No similar papers found.
M
Masnun Nuha Chowdhury
Systems and Software Lab (SSL), Department of Computer Science and Engineering, Islamic University of Technology, Dhaka, Bangladesh
N
Nusrat Jahan Beg
Systems and Software Lab (SSL), Department of Computer Science and Engineering, Islamic University of Technology, Dhaka, Bangladesh
U
Umme Hunny Khan
Systems and Software Lab (SSL), Department of Computer Science and Engineering, Islamic University of Technology, Dhaka, Bangladesh
S
Syed Rifat Raiyan
Systems and Software Lab (SSL), Department of Computer Science and Engineering, Islamic University of Technology, Dhaka, Bangladesh
Md Kamrul Hasan
Md Kamrul Hasan
Department of Computer Science
Smart HealthNoninvasive Blood TestImage processing
Hasan Mahmud
Hasan Mahmud
Postdoctoral Research Associate, Rochester Institute of Technology
Information SystemsAlgorithmic decision-makingHCI/Human-AI interaction