Phi-Actor-Critic: Steering General-Sum Games to Pareto-Efficient Correlated Equilibria

📅 2026-06-09

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

In multi-agent games, the coexistence of multiple suboptimal Nash equilibria often impedes standard reinforcement learning from achieving coordinated outcomes with high social welfare. This work proposes a novel approach that integrates swap regret minimization, a centralized attention-based critic, and Lagrangian equilibrium selection, unifying scalable vector-valued regret estimation with social welfare optimization within a deep multi-agent reinforcement learning framework for the first time. The method steers agents toward convergence to Pareto-efficient correlated equilibria, significantly enhancing collective returns and fairness across diverse benchmarks—including matrix games, Multi-Agent Particle Environments (MPE), and the Melting Pot Harvest task—thereby enabling efficient and stable coordination.

📝 Abstract

Real-world multi-agent systems, from traffic coordination to resource allocation, are often modeled as general-sum games where individual incentives conflict with collective welfare. In these settings, the central challenge is not merely finding an equilibrium, but selecting socially desirable outcomes among many suboptimal Nash equilibria. Standard deep multi-agent reinforcement learning (MARL) methods struggle with this problem, as value-decomposition approaches are constrained by monotonicity assumptions and policy-gradient methods often converge to stable but socially inefficient equilibria. To address this limitation, we propose $Φ$-Actor-Critic ($Φ$-AC), a framework that leverages swap regret minimization to steer learning toward high-welfare correlated equilibria (CE). To make counterfactual regret estimation tractable in deep MARL, $Φ$-AC employs a centralized attention critic that predicts vector-valued regrets in a single forward pass, avoiding computationally expensive counterfactual simulations. We further introduce a Lagrangian-based equilibrium selection mechanism that optimizes social welfare while enforcing stability through regret constraints. Experiments on matrix games, Multi-Agent Particle Environments (MPE), and the Melting Pot Harvest scenario demonstrate that $Φ$-AC learns efficient and stable coordination strategies across diverse mixed-motive settings while maintaining high collective return and competitive fairness.

Problem

Research questions and friction points this paper is trying to address.

general-sum games

Pareto-efficient correlated equilibria

multi-agent reinforcement learning

equilibrium selection

social welfare

Innovation

Methods, ideas, or system contributions that make the work stand out.

swap regret minimization

correlated equilibrium

centralized attention critic

Pareto efficiency

multi-agent reinforcement learning

🔎 Similar Papers

Convergence of Decentralized Actor-Critic Algorithm in General–Sum Markov Games

2024-09-06IEEE Control Systems LettersCitations: 0