🤖 AI Summary
This work investigates the strategic reasoning and adaptive decision-making capabilities of large language models (LLMs) in dynamic, real-time, multi-agent collaborative settings. Existing evaluation paradigms are largely confined to static or turn-based environments, failing to capture the challenges of sustained, interactive coordination. To address this, we propose a novel framework integrating game-theoretic principles—specifically belief consistency and Nash equilibrium—with real-time feedback-driven dynamic policy optimization. The framework enables low-latency (<1.05 ms), robust decision-making and significantly improves collaborative efficiency under noisy conditions. Experiments demonstrate a 26% increase in task return over PPO baselines, alongside concurrent gains in task completion rate and system resilience. Our core contribution is the first integration of verifiable game-theoretic equilibrium constraints into an LLM-driven, closed-loop real-time multi-agent policy update mechanism, unifying theoretical interpretability with engineering effectiveness.
📝 Abstract
Large language models (LLMs) demonstrate strong reasoning abilities across mathematical, strategic, and linguistic tasks, yet little is known about how well they reason in dynamic, real-time, multi-agent scenarios, such as collaborative environments in which agents continuously adapt to each other's behavior, as in cooperative gameplay settings. In this paper, we bridge this gap by combining LLM-driven agents with strategic reasoning and real-time adaptation in cooperative, multi-agent environments grounded in game-theoretic principles such as belief consistency and Nash equilibrium. The proposed framework applies broadly to dynamic scenarios in which agents coordinate, communicate, and make decisions in response to continuously changing conditions. We provide real-time strategy refinement and adaptive feedback mechanisms that enable agents to dynamically adjust policies based on immediate contextual interactions, in contrast to previous efforts that evaluate LLM capabilities in static or turn-based settings. Empirical results show that our method achieves up to a 26% improvement in return over PPO baselines in high-noise environments, while maintaining real-time latency under 1.05 milliseconds. Our approach improves collaboration efficiency, task completion rates, and flexibility, illustrating that game-theoretic guidance integrated with real-time feedback enhances LLM performance, ultimately fostering more resilient and flexible strategic multi-agent systems.