Multi-Task Reinforcement Learning Enables Parameter Scaling

📅 2025-03-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
It remains unclear whether performance gains in multi-task reinforcement learning (MTRL) stem primarily from sophisticated architectural designs or merely from increased parameter count. Method: We conduct a systematic parameter scaling analysis, decoupling independent scaling of critic and actor networks, and investigate how task diversity affects training stability. Contributions/Results: (1) Under fixed compute budgets, a simple baseline scaling only parameters—especially the critic—significantly outperforms state-of-the-art complex architectures. (2) Critic scaling contributes substantially more to performance gains than actor scaling. (3) Increasing task diversity inherently mitigates plasticity loss, improving training stability and reducing plasticity degradation by over 40%. This work provides the first empirical evidence that parameter scale—not architectural complexity—is the primary driver of MTRL performance gains. It further reveals the critic’s dominant role and identifies task diversity as an intrinsic regularizer that enhances generalization and stability.

Technology Category

Application Category

📝 Abstract
Multi-task reinforcement learning (MTRL) aims to endow a single agent with the ability to perform well on multiple tasks. Recent works have focused on developing novel sophisticated architectures to improve performance, often resulting in larger models; it is unclear, however, whether the performance gains are a consequence of the architecture design itself or the extra parameters. We argue that gains are mostly due to scale by demonstrating that naively scaling up a simple MTRL baseline to match parameter counts outperforms the more sophisticated architectures, and these gains benefit most from scaling the critic over the actor. Additionally, we explore the training stability advantages that come with task diversity, demonstrating that increasing the number of tasks can help mitigate plasticity loss. Our findings suggest that MTRL's simultaneous training across multiple tasks provides a natural framework for beneficial parameter scaling in reinforcement learning, challenging the need for complex architectural innovations.
Problem

Research questions and friction points this paper is trying to address.

Determines if performance gains in MTRL are due to architecture or parameter scaling.
Shows scaling parameters in simple MTRL models outperforms complex architectures.
Explores task diversity's role in improving training stability in MTRL.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Scaling simple MTRL outperforms complex architectures
Critic scaling benefits performance more than actor
Task diversity enhances training stability and plasticity
🔎 Similar Papers
No similar papers found.
R
Reginald McLean
Department of Computer Science, Toronto Metropolitan University, Canada; Farama Foundation
E
Evangelos Chataroulas
Department of Computer Science, University of Surrey, United Kingdom
J
Jordan Terry
Farama Foundation
Isaac Woungang
Isaac Woungang
Toronto Metropolitan University (TMU)
Next generation wireless networksNetwork security
N
N. Farsad
Department of Computer Science, Toronto Metropolitan University, Canada
Pablo Samuel Castro
Pablo Samuel Castro
Google
Reinforcement LearningMachine LearningArtificial IntelligenceCreativityMusic