A closer look at how large language models trust humans: patterns and biases

📅 2025-04-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates how large language models (LLMs) form trust in human collaborators along three theoretically grounded dimensions—competence, benevolence, and integrity—and examines how demographic variables (age, gender, religion) influence and potentially bias such trust judgments. Grounded in behavioral trust theory, the work employs large-scale prompt engineering and controlled cross-model (five LLMs) and multi-scenario (five domains, including high-stakes financial decision-making) simulations, yielding 43,200 experimental runs for statistical attribution analysis. Key contributions include: (1) the first empirical demonstration that LLMs exhibit human-like trust calibration, with the three dimensions strongly predicting trust scores; (2) evidence of systematic demographic bias—particularly pronounced in high-risk domains and increasingly evident in newer, more capable models; and (3) novel insights into the internal mechanisms, limitations, and inter-model variability of LLM trust modeling, thereby advancing theoretical foundations and proposing a new evaluation paradigm for trustworthy human-AI collaboration.

Technology Category

Application Category

📝 Abstract
As large language models (LLMs) and LLM-based agents increasingly interact with humans in decision-making contexts, understanding the trust dynamics between humans and AI agents becomes a central concern. While considerable literature studies how humans trust AI agents, it is much less understood how LLM-based agents develop effective trust in humans. LLM-based agents likely rely on some sort of implicit effective trust in trust-related contexts (e.g., evaluating individual loan applications) to assist and affect decision making. Using established behavioral theories, we develop an approach that studies whether LLMs trust depends on the three major trustworthiness dimensions: competence, benevolence and integrity of the human subject. We also study how demographic variables affect effective trust. Across 43,200 simulated experiments, for five popular language models, across five different scenarios we find that LLM trust development shows an overall similarity to human trust development. We find that in most, but not all cases, LLM trust is strongly predicted by trustworthiness, and in some cases also biased by age, religion and gender, especially in financial scenarios. This is particularly true for scenarios common in the literature and for newer models. While the overall patterns align with human-like mechanisms of effective trust formation, different models exhibit variation in how they estimate trust; in some cases, trustworthiness and demographic factors are weak predictors of effective trust. These findings call for a better understanding of AI-to-human trust dynamics and monitoring of biases and trust development patterns to prevent unintended and potentially harmful outcomes in trust-sensitive applications of AI.
Problem

Research questions and friction points this paper is trying to address.

How LLMs develop trust in humans based on competence, benevolence, and integrity.
Examining demographic biases in LLM trust formation, especially in financial scenarios.
Comparing LLM trust patterns to human trust dynamics to prevent harmful outcomes.
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLMs assess trust via competence, benevolence, integrity
Simulated experiments reveal human-like trust patterns
Demographic biases detected in financial scenarios
🔎 Similar Papers
No similar papers found.