🤖 AI Summary
This work addresses the scalability challenges of multitask reinforcement learning when extended to diverse tasks, where existing approaches often rely on complex planning or training procedures. The authors propose MR.Q, a lightweight and efficient model-free actor-critic architecture that leverages auxiliary prediction objectives to learn informative representations and integrates high-capacity value function approximation. This design enables effective multitask learning without explicit planning. Experimental results demonstrate that MR.Q significantly outperforms world-model-based methods and established deep reinforcement learning baselines across a range of continuous-control multitask environments, achieving higher sample efficiency and runtime performance while reducing computational overhead.
📝 Abstract
Scaling reinforcement learning (RL) to diverse multitask settings remains a central challenge. While recent advances in model-based RL achieve strong performance, they rely on planning and complex training pipelines, making it unclear which components are essential for scalability. We revisit this question and argue that the primary driver of scalable multitask RL is not model-based control, but \emph{representation learning}. In particular, we show that combining predictive, model-based representations with high-capacity value function approximation is sufficient to achieve strong performance, even without planning. We evaluate a simple model-free algorithm, MR.Q, coupled with auxiliary predictive objectives into a scalable actor-critic architecture. This approach outperforms a recent world-model-based method and a range of deep RL baselines across a diverse suite of multitask continuous control tasks, while significantly reducing computational overhead and improving wall-clock efficiency. We observe consistent improvements with increased model capacity and show through ablations that predictive representation learning is critical for performance.