π€ AI Summary
Existing collaborative multi-agent reinforcement learning (MARL) frameworks lack flexible, scalable, and end-to-end trainable architectures, often relying on fixed topologies or centralized training paradigms. Method: We propose Reinforcement Networks (RN), the first MARL framework that unifies agent systems as arbitrary directed acyclic graphs (DAGs), enabling modular, hierarchical, and graph-structured coordination. RN introduces a DAG-driven agent organization, end-to-end gradient propagation across the graph, graph-aware policy optimization, a novel collaboration-aware credit assignment algorithm, and LevelEnvβan environment abstraction for reproducible evaluation. Contribution/Results: Experiments demonstrate that RN consistently outperforms state-of-the-art baselines across diverse cooperative MARL benchmarks, achieving simultaneous improvements in task performance, scalability, and structural expressiveness. RN establishes a new paradigm for structured, scalable MARL grounded in principled graph-based representation and learning.
π Abstract
Modern AI systems often comprise multiple learnable components that can be naturally organized as graphs. A central challenge is the end-to-end training of such systems without restrictive architectural or training assumptions. Such tasks fit the theory and approaches of the collaborative Multi-Agent Reinforcement Learning (MARL) field. We introduce Reinforcement Networks, a general framework for MARL that organizes agents as vertices in a directed acyclic graph (DAG). This structure extends hierarchical RL to arbitrary DAGs, enabling flexible credit assignment and scalable coordination while avoiding strict topologies, fully centralized training, and other limitations of current approaches. We formalize training and inference methods for the Reinforcement Networks framework and connect it to the LevelEnv concept to support reproducible construction, training, and evaluation. We demonstrate the effectiveness of our approach on several collaborative MARL setups by developing several Reinforcement Networks models that achieve improved performance over standard MARL baselines. Beyond empirical gains, Reinforcement Networks unify hierarchical, modular, and graph-structured views of MARL, opening a principled path toward designing and training complex multi-agent systems. We conclude with theoretical and practical directions - richer graph morphologies, compositional curricula, and graph-aware exploration. That positions Reinforcement Networks as a foundation for a new line of research in scalable, structured MARL.