P2DT: Mitigating Forgetting in Task-Incremental Learning with Progressive Prompt Decision Transformer

📅 2024-01-22
🏛️ IEEE International Conference on Acoustics, Speech, and Signal Processing
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
To address catastrophic forgetting in task-incremental reinforcement learning caused by poor control of large models, this paper proposes the Progressive Prompt-based Decision Transformer (P2DT). Methodologically, P2DT leverages a Transformer architecture integrated with prompt engineering and trajectory distillation, enabling cross-task knowledge consolidation without fine-tuning the backbone network. Its key contributions are: (1) a novel dynamic, expandable decision-token mechanism that supports continual policy evolution without parameter backpropagation; and (2) synergistic integration of offline RL trajectories and self-generated prompts for effective knowledge retention. Evaluated on multi-task RL benchmarks, P2DT achieves an average performance improvement of 37% over baselines while demonstrating strong scalability as the number of tasks increases. The approach significantly mitigates forgetting without compromising architectural efficiency or requiring model retraining.

Technology Category

Application Category

📝 Abstract
Catastrophic forgetting poses a substantial challenge for managing intelligent agents controlled by a large model, causing performance degradation when these agents face new tasks. In our work, we propose a novel solution - the Progressive Prompt Decision Transformer (P2DT). This method enhances a transformer-based model by dynamically appending decision tokens during new task training, thus fostering task-specific policies. Our approach mitigates forgetting in continual and offline reinforcement learning scenarios. Moreover, P2DT leverages trajectories collected via traditional reinforcement learning from all tasks and generates new taskspecific tokens during training, thereby retaining knowledge from previous studies. Preliminary results demonstrate that our model effectively alleviates catastrophic forgetting and scales well with increasing task environments.
Problem

Research questions and friction points this paper is trying to address.

Mitigating catastrophic forgetting in task-incremental learning scenarios
Addressing performance degradation when agents face new tasks
Retaining knowledge from previous tasks in continual reinforcement learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Progressive Prompt Decision Transformer for task-incremental learning
Dynamically appends decision tokens during new task training
Leverages trajectories from all tasks to retain knowledge
🔎 Similar Papers
No similar papers found.
Z
Zhiyuan Wang
Tsinghua Shenzhen International Graduate School, Tsinghua University, China
X
Xiaoyang Qu
Ping An Technology (Shenzhen) Co., Ltd., Shenzhen, China
J
Jing Xiao
Ping An Technology (Shenzhen) Co., Ltd., Shenzhen, China
B
Bokui Chen
Tsinghua Shenzhen International Graduate School, Tsinghua University, China
Jianzong Wang
Jianzong Wang
Postdoctoral Researcher of Department of Electrical and Computer Engineering, University of Florida
Big DataStorage SystemCloud Computing