One Model for All Tasks: Leveraging Efficient World Models in Multi-Task Planning

📅 2025-09-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In heterogeneous multi-task learning, significant disparities in observation/action spaces and task difficulty cause gradient conflicts and model plasticity degradation. To address this, we propose ScaleZero—a Mixture-of-Experts (MoE)-based multi-task world model. Its core innovation is a LoRA-driven dynamic parameter expansion mechanism that enables on-demand expert activation and low-rank incremental updates, balancing computational efficiency with continual knowledge retention. ScaleZero integrates online reinforcement learning, dynamic parameter scaling, and task-adaptive expert routing. Evaluated on Atari, DeepMind Control Suite (DMControl), and Jericho benchmarks, it matches or exceeds the performance of task-specialized single-task models. Remarkably, it achieves competitive performance using only 80% of the environment interaction steps required by baseline methods, demonstrating substantial improvements in both sample and computational efficiency.

Technology Category

Application Category

📝 Abstract
In heterogeneous multi-task learning, tasks not only exhibit diverse observation and action spaces but also vary substantially in intrinsic difficulty. While conventional multi-task world models like UniZero excel in single-task settings, we find that when handling large-scale heterogeneous environments, gradient conflicts and the loss of model plasticity often constrain their sample and computational efficiency. In this work, we address these challenges from two perspectives: the single learning iteration and the overall learning process. First, we investigate the impact of key design spaces on extending UniZero to multi-task planning. We find that a Mixture-of-Experts (MoE) architecture provides the most substantial performance gains by mitigating gradient conflicts, leading to our proposed model, extit{ScaleZero}. Second, to dynamically balance the computational load across the learning process, we introduce an online, LoRA-based extit{dynamic parameter scaling} (DPS) strategy. This strategy progressively integrates LoRA adapters in response to task-specific progress, enabling adaptive knowledge retention and parameter expansion. Empirical evaluations on standard benchmarks such as Atari, DMControl (DMC), and Jericho demonstrate that ScaleZero, relying exclusively on online reinforcement learning with one model, attains performance on par with specialized single-task baselines. Furthermore, when augmented with our dynamic parameter scaling strategy, our method achieves competitive performance while requiring only 80% of the single-task environment interaction steps. These findings underscore the potential of ScaleZero for effective large-scale multi-task learning. Our code is available at extcolor{magenta}{https://github.com/opendilab/LightZero}.
Problem

Research questions and friction points this paper is trying to address.

Addressing gradient conflicts in multi-task world models
Mitigating loss of model plasticity in heterogeneous tasks
Improving sample and computational efficiency in planning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Mixture-of-Experts architecture mitigates gradient conflicts
Dynamic parameter scaling with LoRA adapters
Single model achieves multi-task performance efficiently
🔎 Similar Papers
No similar papers found.
Y
Yuan Pu
Shanghai Artificial Intelligence Laboratory
Y
Yazhe Niu
Shanghai Artificial Intelligence Laboratory, The Chinese University of Hong Kong
J
Jia Tang
Shanghai Artificial Intelligence Laboratory, Nanjing University of Aeronautics and Astronautic
J
Junyu Xiong
Shanghai Artificial Intelligence Laboratory, University of Science and Technology of China
Shuai Hu
Shuai Hu
Siberian Branch of the Russian Academy of Sciences
ML、Psychology
H
Hongsheng Li
The Chinese University of Hong Kong, Centre for Perceptual and Interactive Intelligence