🤖 AI Summary
It remains unclear whether performance gains in multi-task reinforcement learning (MTRL) stem primarily from sophisticated architectural designs or merely from increased parameter count.
Method: We conduct a systematic parameter scaling analysis, decoupling independent scaling of critic and actor networks, and investigate how task diversity affects training stability.
Contributions/Results: (1) Under fixed compute budgets, a simple baseline scaling only parameters—especially the critic—significantly outperforms state-of-the-art complex architectures. (2) Critic scaling contributes substantially more to performance gains than actor scaling. (3) Increasing task diversity inherently mitigates plasticity loss, improving training stability and reducing plasticity degradation by over 40%. This work provides the first empirical evidence that parameter scale—not architectural complexity—is the primary driver of MTRL performance gains. It further reveals the critic’s dominant role and identifies task diversity as an intrinsic regularizer that enhances generalization and stability.
📝 Abstract
Multi-task reinforcement learning (MTRL) aims to endow a single agent with the ability to perform well on multiple tasks. Recent works have focused on developing novel sophisticated architectures to improve performance, often resulting in larger models; it is unclear, however, whether the performance gains are a consequence of the architecture design itself or the extra parameters. We argue that gains are mostly due to scale by demonstrating that naively scaling up a simple MTRL baseline to match parameter counts outperforms the more sophisticated architectures, and these gains benefit most from scaling the critic over the actor. Additionally, we explore the training stability advantages that come with task diversity, demonstrating that increasing the number of tasks can help mitigate plasticity loss. Our findings suggest that MTRL's simultaneous training across multiple tasks provides a natural framework for beneficial parameter scaling in reinforcement learning, challenging the need for complex architectural innovations.