🤖 AI Summary
This study investigates whether large language models (LLMs) can serve as cooperative agents to actively assist human players in winning multi-player UNO games.
Method: We propose a framework embedding decoder-only LLMs into the RLCard environment, enabling them to act as real-time decision-making agents throughout full game episodes. Two novel state-aware prompting strategies are introduced to encode structured, global game state information—ensuring non-competitive, goal-directed assistance while preserving rule compliance.
Contribution/Results: Experiments span mainstream LLMs ranging from 1B to 70B parameters. All models significantly outperform random baselines; notably, larger models (≥7B parameters) achieve statistically significant collaborative gains—improving assisted human players’ win rates without violating game rules or compromising agent validity. To our knowledge, this is the first systematic empirical validation of LLMs as trustworthy, rule-abiding collaborators in realistic, fully specified multi-agent environments, delineating both feasibility and operational boundaries.
📝 Abstract
LLMs promise to assist humans -- not just by answering questions, but by offering useful guidance across a wide range of tasks. But how far does that assistance go? Can a large language model based agent actually help someone accomplish their goal as an active participant? We test this question by engaging an LLM in UNO, a turn-based card game, asking it not to win but instead help another player to do so. We built a tool that allows decoder-only LLMs to participate as agents within the RLCard game environment. These models receive full game-state information and respond using simple text prompts under two distinct prompting strategies. We evaluate models ranging from small (1B parameters) to large (70B parameters) and explore how model scale impacts performance. We find that while all models were able to successfully outperform a random baseline when playing UNO, few were able to significantly aid another player.