Collective Hallucination in Multi-Agent LLMs:Modeling and Defense

📅 2026-06-05

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This study addresses the propagation, reinforcement, and amplification of hallucinations in multi-agent large language model systems, which critically undermine collective reasoning reliability. The work introduces the first system-level formulation of hallucination as a dynamic diffusion process over communication topologies and proposes an integrated defense framework that combines confidence-weighted aggregation, adaptive influence modulation, external fact verification, and isolation of unreliable agents. Experimental results on TruthfulQA and TriviaQA demonstrate that the proposed approach reduces hallucination rates by up to 39.0% compared to baseline methods, improves factual accuracy from 0.79 to 0.87, and enhances semantic consistency from 0.75 to 0.84, while effectively suppressing hallucination amplification even under adversarial conditions.

📝 Abstract

Hallucinations in large language models (LLMs) create heightened risks in multi-agent settings, where recursive agent interactions can propagate, reinforce, and amplify unsupported claims. This paper models hallucination as a system-level, time-evolving process across a network of interacting LLM agents, where nodes represent agents and edges encode information exchange. The proposed formulation captures how hallucinated claims diffuse through communication topologies, intensify under adversarial perturbations, and affect collective reliability across reasoning rounds. To suppress error propagation, we introduce an interaction-aware control method that combines confidence-weighted aggregation, adaptive impact regulation, external claim verification, and selective isolation of unreliable agents. Experiments on TruthfulQA and TriviaQA show that the proposed method reduces hallucination by up to 39.0% relative to undefended multi-agent reasoning, improves factual accuracy from 0.79 to 0.87, and increases semantic consistency from 0.75 to 0.84. Under adversarial conditions, the method limits hallucination amplification to 1.08, compared with 1.45 without adaptive control, maintaining stable collective behavior across recursive interaction rounds. These results indicate that hallucination in multi-agent LLM systems is governed by both individual model reliability and system-level interaction dynamics, including communication topology, confidence coupling, and recursive information flow.

Problem

Research questions and friction points this paper is trying to address.

Collective Hallucination

Multi-Agent LLMs

Error Propagation

Interaction Dynamics

Factual Reliability

Innovation

Methods, ideas, or system contributions that make the work stand out.

Collective Hallucination

Multi-Agent LLMs

Interaction-Aware Control