POIROT: Interrogating Agents for Failure Detection in Multi-Agent Systems

📅 2026-06-01
📈 Citations: 0
Influential: 0
📄 PDF

career value

206K/year
🤖 AI Summary
Large language model (LLM)-driven multi-agent systems are vulnerable to subtle, emergent failures and hallucinations, which hinder their deployment in safety-critical applications. This work proposes a decentralized diagnostic protocol that leverages the cognitive diversity among agents to construct an intrinsic fault-detection layer, enabling self-diagnosis without external supervision through inter-agent interrogation and collective auditing. By circumventing the single-point failure limitations of centralized evaluation, the approach significantly outperforms single-LLM baselines across diverse complex tasks (OR = 1.60, p = 0.008), with performance gains amplifying as task complexity, agent count, and fault dimensionality increase. The authors further release POIROT, an open-source diagnostic library, alongside the BLAME benchmark to facilitate future research in autonomous multi-agent reliability.
📝 Abstract
Orchestrating Large Language Models into Multi-Agent Systems (LLM-MAS) has unlocked remarkable reasoning capabilities, yet emergent failures and hallucinations that resist characterisation block their deployment in safety-critical domains -- a gap made legally untenable by emerging AI regulation. Existing evaluation paradigms share a common flaw: centralised judgment creates single points of failure and demands domain-specific expertise. Here we present POIROT, a protocol that repurposes a system's own agents as its diagnostic layer, leveraging the epistemic diversity already present in the architecture. Across evaluated settings, POIROT outperforms single-LLM evaluator baselines, with gains that scale with problem complexity (OR = 1.60, $p = 0.008$), agent count, and fault dimensionality, persisting under compound fault conditions. These results demonstrate that safety oversight need not be externalised: the agents executing a role carry sufficient collective intelligence to audit it. We release POIROT as an open-source library alongside BLAME, a benchmark for fault attribution in safety-critical multi-agent systems.
Problem

Research questions and friction points this paper is trying to address.

failure detection
multi-agent systems
hallucination
safety-critical
LLM-MAS
Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-agent systems
failure detection
epistemic diversity
self-auditing
LLM orchestration
🔎 Similar Papers
2024-03-04Proceedings of the 17th International Conference on Agents and Artificial IntelligenceCitations: 3
I
Iñaki Dellibarda Varela
Center for Automation and Robotics, Spanish National Research Council (CSIC-UPM), Madrid, Spain.
R
R. Sendra-Arranz
Center for Automation and Robotics, Spanish National Research Council (CSIC-UPM), Madrid, Spain.
P
Pablo Romero-Sorozabal
Center for Automation and Robotics, Spanish National Research Council (CSIC-UPM), Madrid, Spain.
J
J. M. Valverde-García
Center for Automation and Robotics, Spanish National Research Council (CSIC-UPM), Madrid, Spain.
A
Annemarie F. Laudanski
Center for Automation and Robotics, Spanish National Research Council (CSIC-UPM), Madrid, Spain.
Á
Álvaro Gutiérrez
ETSI Telecomunicación, Universidad Politécnica de Madrid (UPM), Madrid, Spain.
E
Eduardo Rocon
Center for Automation and Robotics, Spanish National Research Council (CSIC-UPM), Madrid, Spain.
Manuel Cebrian
Manuel Cebrian
Spanish National Research Council
Computational Social ScienceArtificial Intelligence