Attention Knows Whom to Trust: Attention-based Trust Management for LLM Multi-Agent Systems

📅 2025-06-03

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

LLM-based multi-agent systems (LLM-MAS) suffer from robustness deficiencies due to blind trust in input messages; existing works address isolated credibility threats in a fragmented manner, lacking systematic credibility modeling. Method: We propose an attention-based message credibility assessment framework: for the first time, we analyze LLM internal attention head behaviors across six orthogonal trust dimensions, revealing that specific heads intrinsically detect distinct categories of trust violations—enabling lightweight, prompt-free, verifier-free, and training-free credibility inference. We further design a dual-granularity trust management system (TMS) operating at both message- and agent-level. Contribution/Results: Extensive multi-task experiments demonstrate significant improvements in system robustness against adversarial and noisy inputs. The approach is modular, plug-and-play, and requires no architectural or training modifications to underlying LLMs.

Technology Category

Application Category

📝 Abstract

Large Language Model-based Multi-Agent Systems (LLM-MAS) have demonstrated strong capabilities in solving complex tasks but remain vulnerable when agents receive unreliable messages. This vulnerability stems from a fundamental gap: LLM agents treat all incoming messages equally without evaluating their trustworthiness. While some existing studies approach the trustworthiness, they focus on a single type of harmfulness rather than analyze it in a holistic approach from multiple trustworthiness perspectives. In this work, we propose Attention Trust Score (A-Trust), a lightweight, attention-based method for evaluating message trustworthiness. Inspired by human communication literature[1], through systematically analyzing attention behaviors across six orthogonal trust dimensions, we find that certain attention heads in the LLM specialize in detecting specific types of violations. Leveraging these insights, A-Trust directly infers trustworthiness from internal attention patterns without requiring external prompts or verifiers. Building upon A-Trust, we develop a principled and efficient trust management system (TMS) for LLM-MAS, enabling both message-level and agent-level trust assessment. Experiments across diverse multi-agent settings and tasks demonstrate that applying our TMS significantly enhances robustness against malicious inputs.

Problem

Research questions and friction points this paper is trying to address.

LLM agents lack trust evaluation for incoming messages

Existing studies ignore holistic trustworthiness from multiple perspectives

Need lightweight method to assess message trustworthiness in multi-agent systems

Innovation

Methods, ideas, or system contributions that make the work stand out.

Attention Trust Score (A-Trust) evaluates message trustworthiness

Analyzes attention behaviors across six trust dimensions

Enables message-level and agent-level trust assessment

🔎 Similar Papers

No similar papers found.

Authors to Follow