π€ AI Summary
This study investigates the robustness of coordination among multi-agent large language models (LLMs) in a four-player Stag Hunt game through cheap talk, with a focus on the effects of Byzantine attacks and communication topology constraints. Drawing on 720 experiments across six model types, and integrating cheap talk protocols, Byzantine fault tolerance analysis, and topological control, the work demonstrates that coordination failures stem primarily from agentsβ meta-reasoning biases about hidden information rather than mere informational deficits. Two cross-model behavioral archetypes consistently emerge: betrayal-prone and cooperation-persistent. The presence of Byzantine agents can trigger sustained exploitation, from which groups struggle to recover collaborative equilibrium. Moreover, explicit communication restrictions substantially reduce cooperation rates, whereas implicit constraints exert negligible impact, revealing a latent vulnerability in current LLM-based multi-agent systems.
π Abstract
Multi-agent LLM systems increasingly rely on communication protocols for coordination, yet their robustness under adversarial and structural constraints remains poorly understood. Building on prior work showing that cheap-talk channels enable cooperation in LLM coordination games, we investigate two vulnerability classes in a 4-player Stag Hunt across six model families and 720 trials. First, when Byzantine agents signal cooperation but defect, non-Byzantine agents detect the betrayal within one round yet fail to adapt collectively: a substantial fraction continue cooperating despite repeated exploitation, unable to recover coordination due to the game's unanimity payoff structure. Second, explicitly restricting communication topology collapses cooperation, while applying identical restrictions silently preserves near-perfect cooperation. This establishes that coordination failure stems from agents' meta-reasoning about hidden information, not information loss itself. We identify two stable behavioral archetypes that replicate across all model cohorts: Defection-Prone models that switch permanently after betrayal, and Cooperation-Persistent models that continue cooperating at significant individual cost. These findings reveal concrete security vulnerabilities: communication channels can be exploited as adversarial injection vectors, and disclosing network topology to agents can degrade coordination even without any adversary present.