🤖 AI Summary
To address catastrophic forgetting in task-incremental reinforcement learning caused by poor control of large models, this paper proposes the Progressive Prompt-based Decision Transformer (P2DT). Methodologically, P2DT leverages a Transformer architecture integrated with prompt engineering and trajectory distillation, enabling cross-task knowledge consolidation without fine-tuning the backbone network. Its key contributions are: (1) a novel dynamic, expandable decision-token mechanism that supports continual policy evolution without parameter backpropagation; and (2) synergistic integration of offline RL trajectories and self-generated prompts for effective knowledge retention. Evaluated on multi-task RL benchmarks, P2DT achieves an average performance improvement of 37% over baselines while demonstrating strong scalability as the number of tasks increases. The approach significantly mitigates forgetting without compromising architectural efficiency or requiring model retraining.
📝 Abstract
Catastrophic forgetting poses a substantial challenge for managing intelligent agents controlled by a large model, causing performance degradation when these agents face new tasks. In our work, we propose a novel solution - the Progressive Prompt Decision Transformer (P2DT). This method enhances a transformer-based model by dynamically appending decision tokens during new task training, thus fostering task-specific policies. Our approach mitigates forgetting in continual and offline reinforcement learning scenarios. Moreover, P2DT leverages trajectories collected via traditional reinforcement learning from all tasks and generates new taskspecific tokens during training, thereby retaining knowledge from previous studies. Preliminary results demonstrate that our model effectively alleviates catastrophic forgetting and scales well with increasing task environments.