🤖 AI Summary
In real-world negotiation, large language models (LLMs) suffer from insufficient strategic depth, weak modeling of human preferences, and limited opponent-aware reasoning (OAR), while existing benchmarks fail to capture genuine game-theoretic complexity. To address this, we propose BargainArena—the first evaluation benchmark for multi-scenario, realistic bargaining tasks. Our method introduces a utility-theoretic assessment framework that implicitly reinforces OAR and economic rationality, and designs a structured iterative feedback mechanism compatible with in-context learning (ICL). The approach integrates utility modeling, OAR-enhanced prompting, and multi-dimensional economic alignment evaluation. Experiments across six complex negotiation scenarios demonstrate significant improvements in LLMs’ strategic depth, robustness, and human alignment—measured via preference consistency, Pareto efficiency, and agreement rate—thereby bridging critical gaps between current LLM capabilities and real-world negotiation requirements.
📝 Abstract
Bargaining, a critical aspect of real-world interactions, presents challenges for large language models (LLMs) due to limitations in strategic depth and adaptation to complex human factors. Existing benchmarks often fail to capture this real-world complexity. To address this and enhance LLM capabilities in realistic bargaining, we introduce a comprehensive framework centered on utility-based feedback. Our contributions are threefold: (1) BargainArena, a novel benchmark dataset with six intricate scenarios (e.g., deceptive practices, monopolies) to facilitate diverse strategy modeling; (2) human-aligned, economically-grounded evaluation metrics inspired by utility theory, incorporating agent utility and negotiation power, which implicitly reflect and promote opponent-aware reasoning (OAR); and (3) a structured feedback mechanism enabling LLMs to iteratively refine their bargaining strategies. This mechanism can positively collaborate with in-context learning (ICL) prompts, including those explicitly designed to foster OAR. Experimental results show that LLMs often exhibit negotiation strategies misaligned with human preferences, and that our structured feedback mechanism significantly improves their performance, yielding deeper strategic and opponent-aware reasoning.