GARL: Game-Theoretic Reinforcement Learning for Multi-Agent Strategic Prioritisation

📅 2026-06-03

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

Existing multi-agent systems lack a general reward mechanism grounded in interaction structures for strategic decision-making tasks, limiting their policy optimization efficacy. This work proposes a two-stage game-theoretic framework: agents first allocate strategic resources over a shared candidate set, after which a high-level arbitrator produces a final ranking used to derive role-specific reinforcement signals. The approach uniquely formalizes game-theoretic structures into the objective function of multi-agent reinforcement learning, enabling interaction-aware policy optimization. Evaluated on legal dispute issue prioritization, the framework significantly improves ranking performance—allowing open-source small models to match the capabilities of closed-source strong baselines—and enhances model competence in both legal reasoning and general strategic decision-making.

📝 Abstract

LLM-based multi-agent systems are increasingly used for strategic decision-making tasks. In such settings, performance depends not only on individual model capabilities, but also on the policies by which agents interact and adapt. Multi-agent reinforcement learning can optimise these interaction policies, but its reward design often remains task-specific and weakly grounded in interaction structure. To address this gap, we propose GARL, a GAme-theoretic Reinforcement Learning framework for multi-agent strategic prioritisation. GARL formalises strategic prioritisation as a two-stage game: competing agents first allocate strategic resources over a shared candidate set, and a higher-level arbiter then produces the final ranking. The resulting game-theoretic utilities are converted into role-specific reinforcement signals, allowing policy optimisation to be guided by structured interaction. We instantiate GARL on issues-in-dispute ranking, where the goal is to prioritise core issues in legal proceedings. Experiments show that GARL improves ranking performance, enables small open-source LLMs to become competitive with a strong closed-source LLM under the same candidate-ranking setting, and yields gains in legal-domain competence and broader strategic decision-making. Overall, GARL demonstrates how game-theoretic interaction structure can be turned into reinforcement-learning objectives, providing a principled approach to policy optimisation in multi-agent strategic prioritisation.

Problem

Research questions and friction points this paper is trying to address.

multi-agent reinforcement learning

strategic prioritisation

game-theoretic interaction

reward design

LLM-based multi-agent systems

Innovation

Methods, ideas, or system contributions that make the work stand out.

Game-theoretic Reinforcement Learning

Multi-Agent Strategic Prioritisation

Role-Specific Reinforcement Signals