Action Dependency Graphs for Globally Optimal Coordinated Reinforcement Learning

📅 2025-06-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses two key challenges in multi-agent reinforcement learning (MARL): the difficulty of achieving globally optimal coordinated decisions and the poor scalability of existing autoregressive policies. To this end, we propose the Action Dependency Graph (ADG), a novel non-autoregressive modeling framework that formally encodes sparse action dependencies among agents. Under the coordination graph constraint, we establish—for the first time—theoretical connections between ADG structure and global optimality, deriving verifiable sufficient conditions for optimality. Leveraging these insights, we design a tabular policy iteration algorithm with provable global optimality guarantees, compatible with mainstream value-decomposition frameworks such as VDN and QMIX. Empirical evaluation demonstrates that our approach significantly improves scalability and robustness in large-scale cooperative tasks, effectively mitigating the sensitivity of conventional MARL algorithms to agent count growth.

Technology Category

Application Category

📝 Abstract
Action-dependent individual policies, which incorporate both environmental states and the actions of other agents in decision-making, have emerged as a promising paradigm for achieving global optimality in multi-agent reinforcement learning (MARL). However, the existing literature often adopts auto-regressive action-dependent policies, where each agent's policy depends on the actions of all preceding agents. This formulation incurs substantial computational complexity as the number of agents increases, thereby limiting scalability. In this work, we consider a more generalized class of action-dependent policies, which do not necessarily follow the auto-regressive form. We propose to use the `action dependency graph (ADG)' to model the inter-agent action dependencies. Within the context of MARL problems structured by coordination graphs, we prove that an action-dependent policy with a sparse ADG can achieve global optimality, provided the ADG satisfies specific conditions specified by the coordination graph. Building on this theoretical foundation, we develop a tabular policy iteration algorithm with guaranteed global optimality. Furthermore, we integrate our framework into several SOTA algorithms and conduct experiments in complex environments. The empirical results affirm the robustness and applicability of our approach in more general scenarios, underscoring its potential for broader MARL challenges.
Problem

Research questions and friction points this paper is trying to address.

Modeling inter-agent action dependencies without auto-regressive constraints
Achieving global optimality in MARL with sparse action dependency graphs
Reducing computational complexity for scalable multi-agent reinforcement learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Action Dependency Graphs model inter-agent dependencies
Sparse ADG ensures global optimality in MARL
Tabular policy iteration algorithm guarantees optimality
🔎 Similar Papers
No similar papers found.
J
Jianglin Ding
School of Automation, Chongqing University
J
Jingcheng Tang
School of Automation, Chongqing University
Gangshan Jing
Gangshan Jing
Chongqing University
Network systemsControlOptimizationReinforcement learning