🤖 AI Summary
Prior work on trust in large language model (LLM) multi-agent systems lacks systematic comparison of trust-building strategies, relies predominantly on explicit self-reports, and fails to examine the alignment between implicit behavioral measures and explicit judgments. Method: We conduct a controlled evaluation of three trust-formation strategies—dynamic affinity generation, preset trust scripting, and system prompt fine-tuning—and concurrently collect implicit behavioral metrics (persuasion susceptibility, financial cooperation propensity) alongside explicit self-reports (dyadic trust questionnaire). Contribution/Results: We find weak or even significantly negative correlations between explicit questionnaire responses and implicit behavioral indicators, suggesting that direct verbal elicitation is prone to distortion. Implicit measures better capture authentic, context-sensitive trust dynamics. This study introduces and empirically validates the first behaviorally grounded, situation-aware paradigm for assessing inter-LLM trust—establishing both theoretical foundations and an actionable framework for building trustworthy multi-agent collaboration systems.
📝 Abstract
As large language models (LLMs) increasingly interact with each other, most notably in multi-agent setups, we may expect (and hope) that `trust' relationships develop between them, mirroring trust relationships between human colleagues, friends, or partners. Yet, though prior work has shown LLMs to be capable of identifying emotional connections and recognizing reciprocity in trust games, little remains known about (i) how different strategies to build trust compare, (ii) how such trust can be measured implicitly, and (iii) how this relates to explicit measures of trust.
We study these questions by relating implicit measures of trust, i.e. susceptibility to persuasion and propensity to collaborate financially, with explicit measures of trust, i.e. a dyadic trust questionnaire well-established in psychology. We build trust in three ways: by building rapport dynamically, by starting from a prewritten script that evidences trust, and by adapting the LLMs' system prompt. Surprisingly, we find that the measures of explicit trust are either little or highly negatively correlated with implicit trust measures. These findings suggest that measuring trust between LLMs by asking their opinion may be deceiving. Instead, context-specific and implicit measures may be more informative in understanding how LLMs trust each other.