Explainable and Fine-Grained Safeguarding of LLM Multi-Agent Systems via Bi-Level Graph Anomaly Detection

📅 2025-12-21

📈 Citations: 0

✨ Influential: 0

career value

201K/year

🤖 AI Summary

To address the challenges of fine-grained semantic deficiency and poor interpretability in detecting malicious agents within large language model (LLM)-based multi-agent systems (MAS), this paper proposes XG-Guard—the first two-level graph-based anomaly detection framework. It jointly models sentence-level and token-level semantic representations to construct dialogue graphs and capture dynamic topic evolution. A topic-driven anomaly detector and a token-level contribution quantification module are designed, integrated via an interpretable two-level scoring fusion mechanism for precise attribution. Evaluated across diverse MAS topologies and adversarial scenarios, XG-Guard significantly improves detection robustness—achieving an average 12.6% F1-score gain—while delivering fine-grained, token-level explanations. By simultaneously ensuring high accuracy and high trustworthiness, XG-Guard establishes a novel paradigm for deploying safety-critical MAS.

Technology Category

Application Category

📝 Abstract

Large language model (LLM)-based multi-agent systems (MAS) have shown strong capabilities in solving complex tasks. As MAS become increasingly autonomous in various safety-critical tasks, detecting malicious agents has become a critical security concern. Although existing graph anomaly detection (GAD)-based defenses can identify anomalous agents, they mainly rely on coarse sentence-level information and overlook fine-grained lexical cues, leading to suboptimal performance. Moreover, the lack of interpretability in these methods limits their reliability and real-world applicability. To address these limitations, we propose XG-Guard, an explainable and fine-grained safeguarding framework for detecting malicious agents in MAS. To incorporate both coarse and fine-grained textual information for anomalous agent identification, we utilize a bi-level agent encoder to jointly model the sentence- and token-level representations of each agent. A theme-based anomaly detector further captures the evolving discussion focus in MAS dialogues, while a bi-level score fusion mechanism quantifies token-level contributions for explanation. Extensive experiments across diverse MAS topologies and attack scenarios demonstrate robust detection performance and strong interpretability of XG-Guard.

Problem

Research questions and friction points this paper is trying to address.

Detects malicious agents in LLM multi-agent systems

Improves anomaly detection with fine-grained lexical cues

Provides explainable safeguarding for enhanced reliability

Innovation

Methods, ideas, or system contributions that make the work stand out.

Bi-level agent encoder models sentence and token representations

Theme-based anomaly detector captures evolving dialogue focus

Bi-level score fusion quantifies token contributions for explainability

🔎 Similar Papers

Safeguarding AI Agents: Developing and Analyzing Safety Architectures