🤖 AI Summary
In multi-agent reinforcement learning (MARL), parameter sharing improves training efficiency but hinders agent specialization in heterogeneous environments, degrading overall performance. To address this, we propose LoRASA: a lightweight, low-rank, sparse adapter framework that injects agent-specific adaptation matrices into a shared policy backbone—marking the first integration of low-rank sparse adaptation into MARL policy parameterization to jointly preserve coordination and specialization. Built upon the MAPPO/A2PO frameworks, LoRASA combines low-rank matrix decomposition with a hierarchical adapter architecture for parameter-efficient fine-tuning. Extensive experiments on SMAC and MAMuJoCo benchmarks demonstrate that LoRASA consistently outperforms state-of-the-art baselines, achieving comparable or superior performance while significantly reducing memory footprint and computational overhead. These results validate LoRASA’s effectiveness, generalizability across diverse tasks, and scalability to larger agent populations.
📝 Abstract
Multi-agent reinforcement learning (MARL) often relies on emph{parameter sharing (PS)} to scale efficiently. However, purely shared policies can stifle each agent's unique specialization, reducing overall performance in heterogeneous environments. We propose extbf{Low-Rank Agent-Specific Adaptation (LoRASA)}, a novel approach that treats each agent's policy as a specialized ``task'' fine-tuned from a shared backbone. Drawing inspiration from parameter-efficient transfer methods, LoRASA appends small, low-rank adaptation matrices to each layer of the shared policy, naturally inducing emph{parameter-space sparsity} that promotes both specialization and scalability. We evaluate LoRASA on challenging benchmarks including the StarCraft Multi-Agent Challenge (SMAC) and Multi-Agent MuJoCo (MAMuJoCo), implementing it atop widely used algorithms such as MAPPO and A2PO. Across diverse tasks, LoRASA matches or outperforms existing baselines emph{while reducing memory and computational overhead}. Ablation studies on adapter rank, placement, and timing validate the method's flexibility and efficiency. Our results suggest LoRASA's potential to establish a new norm for MARL policy parameterization: combining a shared foundation for coordination with low-rank agent-specific refinements for individual specialization.