🤖 AI Summary
When foundation models are updated, task vectors trained on older versions cannot be directly transferred to newer versions due to misalignment in parameter spaces, necessitating costly full fine-tuning.
Method: We propose GradFix, a gradient-guided task vector adaptation method that estimates the sign structure of gradients on the new model using only a few labeled samples, then applies a sign-based mask to re-baseline the source task vector—enabling zero-shot cross-version transfer.
Contribution/Results: GradFix is the first method to achieve fine-tuning-free task vector migration across model versions, with theoretical guarantees of first-order descent. By unifying gradient sign analysis, task vector alignment, and parameter-space adaptation, it significantly outperforms naive vector addition and few-shot fine-tuning. Extensive experiments on multimodal vision–language benchmarks demonstrate its effectiveness, robustness, and strong generalization across diverse tasks and model updates.
📝 Abstract
When a new release of a foundation model is published, practitioners typically need to repeat full fine-tuning, even if the same task has already been solved in the previous version. A promising alternative is to reuse the parameter changes (i.e., task vectors) that capture how a model adapts to a specific task. However, they often fail to transfer across different pre-trained models due to their misaligned parameter space. In this work, we show that the key to successful transfer lies in the sign structure of the gradients of the new model. Based on this insight, we propose GradFix, a novel method that approximates the ideal gradient sign structure and leverages it to transfer knowledge using only a handful of labeled samples. Notably, this requires no additional fine-tuning: the adaptation is achieved by computing a few gradients at the target model and masking the source task vector accordingly. This yields an update that is locally aligned with the target loss landscape, effectively rebasing the task vector onto the new pre-training. We provide a theoretical guarantee that our method ensures first-order descent. Empirically, we demonstrate significant performance gains on vision and language benchmarks, consistently outperforming naive task vector addition and few-shot fine-tuning.