Distributed Value Decomposition Networks with Networked Agents

๐Ÿ“… 2025-02-11
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses decentralized multi-agent reinforcement learning (MARL) under partial observability without centralized training, aiming to enable cooperative optimization of the joint cumulative reward by heterogeneous or homogeneous agents relying solely on local observations and peer-to-peer communication. To this end, we propose DVDN and its enhanced variant DVDN(GT), the first algorithms to achieve fully decentralized value decomposition. DVDN employs locally parameterized Q-functions, a shared target consistency constraint, and a local target estimation mechanism to align each agentโ€™s local Q-value with the global Q-value, thereby mitigating information loss induced by limited communication. Evaluated across ten tasks from three standard MARL benchmarks, DVDN(GT) achieves performance comparable to centralized-training VDN while significantly improving effectiveness and robustness in communication-constrained settings.

Technology Category

Application Category

๐Ÿ“ Abstract
We investigate the problem of distributed training under partial observability, whereby cooperative multi-agent reinforcement learning agents (MARL) maximize the expected cumulative joint reward. We propose distributed value decomposition networks (DVDN) that generate a joint Q-function that factorizes into agent-wise Q-functions. Whereas the original value decomposition networks rely on centralized training, our approach is suitable for domains where centralized training is not possible and agents must learn by interacting with the physical environment in a decentralized manner while communicating with their peers. DVDN overcomes the need for centralized training by locally estimating the shared objective. We contribute with two innovative algorithms, DVDN and DVDN (GT), for the heterogeneous and homogeneous agents settings respectively. Empirically, both algorithms approximate the performance of value decomposition networks, in spite of the information loss during communication, as demonstrated in ten MARL tasks in three standard environments.
Problem

Research questions and friction points this paper is trying to address.

Distributed training under partial observability
Decentralized multi-agent reinforcement learning
Local estimation of shared objective
Innovation

Methods, ideas, or system contributions that make the work stand out.

Distributed Value Decomposition Networks
Decentralized multi-agent training
Local shared objective estimation
๐Ÿ”Ž Similar Papers
No similar papers found.
G
Guilherme S. Varela
Instituto Superior Tรฉcnico, INESC-ID, Lisbon, Portugal
A
A. Sardinha
PUC-Rio, Rio de Janeiro, Brazil
Francisco S. Melo
Francisco S. Melo
INESC-ID / Instituto Superior Tecnico
Reinforcement learninginverse reinforcement learningmachine learningplanning in single and multiagent systems