🤖 AI Summary
This work studies the regret lower bound for the decentralized multi-agent stochastic shortest path (Dec-MASSP) problem. To characterize policy structure under linear function approximation for transition dynamics and cost functions, we develop a symmetry-based analytical framework and construct the first hard instance for this setting. We establish the first tight regret lower bound of Ω(√K) for Dec-MASSP, proving that any decentralized algorithm must incur cumulative regret at least of this order over K episodes of online interaction. This result reveals the fundamental hardness of decentralized multi-agent learning in stochastic shortest path environments and provides an unimprovable theoretical benchmark for algorithm design. It fills a critical gap in lower-bound analysis for Dec-MASSP, which was previously absent in the literature.
📝 Abstract
Multi-agent systems (MAS) are central to applications such as swarm robotics and traffic routing, where agents must coordinate in a decentralized manner to achieve a common objective. Stochastic Shortest Path (SSP) problems provide a natural framework for modeling decentralized control in such settings. While the problem of learning in SSP has been extensively studied in single-agent settings, the decentralized multi-agent variant remains largely unexplored. In this work, we take a step towards addressing that gap. We study decentralized multi-agent SSPs (Dec-MASSPs) under linear function approximation, where the transition dynamics and costs are represented using linear models. Applying novel symmetry-based arguments, we identify the structure of optimal policies. Our main contribution is the first regret lower bound for this setting based on the construction of hard-to-learn instances for any number of agents, $n$. Our regret lower bound of $Omega(sqrt{K})$, over $K$ episodes, highlights the inherent learning difficulty in Dec-MASSPs. These insights clarify the learning complexity of decentralized control and can further guide the design of efficient learning algorithms in multi-agent systems.