π€ AI Summary
This work challenges the conventional assumption in fine-tuning that semantically equivalent training prompts yield comparable performance, highlighting their critical yet overlooked impact on cross-task forgetting and generalization. To address this, the authors propose State-Adaptive Prompt Optimization (SAPO), a lightweight mechanism that dynamically adjusts the form of training prompts during learning. SAPO identifies high-quality prompts in real time based on pre-trained task losses and adaptively updates them according to the modelβs current training state. This approach substantially mitigates catastrophic forgetting and enhances generalization, consistently outperforming state-of-the-art methods across multiple benchmarks. The results underscore the pivotal role of training prompt design in learning dynamics and demonstrate its inherent optimizability.
π Abstract
While prompt engineering is instrumental in maximizing the capabilities of Large Language Models (LLMs) during inference, the role of prompts during training remains critically underexplored. Prevailing fine-tuning paradigms typically treat training prompts as mere surface forms, assuming that semantically equivalent instructions yield identical learning outcomes. However, we reveal that this equivalence is deceptive: while paraphrased prompts often lead to comparable in-task performance, they induce drastically different cross-task impacts regarding catastrophic forgetting and generalization. Crucially, these impacts are positively correlated across tasks, indicating the existence of superior prompts that consistently yield better performance. Furthermore, we discover that these superior prompts can be robustly identified by task loss prior to learning. Leveraging these insights, we introduce State-Adaptive Prompt Optimization (SAPO), a lightweight yet effective training strategy that shifts task formulation from a static input to a dynamic, state-adaptive variable. Comprehensive experiments on diverse benchmarks confirm its effectiveness, which significantly mitigates forgetting while improving generalization, achieving substantial performance gains over state-of-the-art methods. These results provide insights into how training prompts shape learning dynamics and offer a practical recipe for robust fine-tuning. Our code is available at https://github.com/Eric8932/SAPO.