Factored Value Functions for Graph-Based Multi-Agent Reinforcement Learning

📅 2026-01-16

📈 Citations: 0

✨ Influential: 0

career value

216K/year

🤖 AI Summary

This work addresses the challenge of credit assignment in large-scale multi-agent systems, where conventional global or local value functions often fail to provide effective learning signals under partial observability and local interaction structures. To this end, the authors propose the Diffusion Value Function (DVF), which integrates temporal discounting with spatial attenuation over an influence graph to assign each agent a value component that balances global consistency with local estimability. Building upon DVF, they develop the Diffusion A2C (DA2C) algorithm coupled with a sparse-communication policy network based on a Learned DropEdge Graph Neural Network, enabling stable decentralized learning over infinite horizons. Experimental results demonstrate that DA2C achieves up to an 11% improvement in average reward over baseline methods across a fire-rescue benchmark and three distributed coordination tasks.

Technology Category

Application Category

📝 Abstract

Credit assignment is a core challenge in multi-agent reinforcement learning (MARL), especially in large-scale systems with structured, local interactions. Graph-based Markov decision processes (GMDPs) capture such settings via an influence graph, but standard critics are poorly aligned with this structure: global value functions provide weak per-agent learning signals, while existing local constructions can be difficult to estimate and ill-behaved in infinite-horizon settings. We introduce the Diffusion Value Function (DVF), a factored value function for GMDPs that assigns to each agent a value component by diffusing rewards over the influence graph with temporal discounting and spatial attenuation. We show that DVF is well-defined, admits a Bellman fixed point, and decomposes the global discounted value via an averaging property. DVF can be used as a drop-in critic in standard RL algorithms and estimated scalably with graph neural networks. Building on DVF, we propose Diffusion A2C (DA2C) and a sparse message-passing actor, Learned DropEdge GNN (LD-GNN), for learning decentralised algorithms under communication costs. Across the firefighting benchmark and three distributed computation tasks (vector graph colouring and two transmit power optimisation problems), DA2C consistently outperforms local and global critic baselines, improving average reward by up to 11%.

Problem

Research questions and friction points this paper is trying to address.

credit assignment

multi-agent reinforcement learning

graph-based MDPs

value function decomposition

infinite-horizon

Innovation

Methods, ideas, or system contributions that make the work stand out.

Diffusion Value Function

Graph-based MARL

Credit Assignment