Gradient-Sign Masking for Task Vector Transport Across Pre-Trained Models

📅 2025-10-07

📈 Citations: 0

✨ Influential: 0

career value

186K/year

🤖 AI Summary

When foundation models are updated, task vectors trained on older versions cannot be directly transferred to newer versions due to misalignment in parameter spaces, necessitating costly full fine-tuning. Method: We propose GradFix, a gradient-guided task vector adaptation method that estimates the sign structure of gradients on the new model using only a few labeled samples, then applies a sign-based mask to re-baseline the source task vector—enabling zero-shot cross-version transfer. Contribution/Results: GradFix is the first method to achieve fine-tuning-free task vector migration across model versions, with theoretical guarantees of first-order descent. By unifying gradient sign analysis, task vector alignment, and parameter-space adaptation, it significantly outperforms naive vector addition and few-shot fine-tuning. Extensive experiments on multimodal vision–language benchmarks demonstrate its effectiveness, robustness, and strong generalization across diverse tasks and model updates.

Technology Category

Application Category

📝 Abstract

When a new release of a foundation model is published, practitioners typically need to repeat full fine-tuning, even if the same task has already been solved in the previous version. A promising alternative is to reuse the parameter changes (i.e., task vectors) that capture how a model adapts to a specific task. However, they often fail to transfer across different pre-trained models due to their misaligned parameter space. In this work, we show that the key to successful transfer lies in the sign structure of the gradients of the new model. Based on this insight, we propose GradFix, a novel method that approximates the ideal gradient sign structure and leverages it to transfer knowledge using only a handful of labeled samples. Notably, this requires no additional fine-tuning: the adaptation is achieved by computing a few gradients at the target model and masking the source task vector accordingly. This yields an update that is locally aligned with the target loss landscape, effectively rebasing the task vector onto the new pre-training. We provide a theoretical guarantee that our method ensures first-order descent. Empirically, we demonstrate significant performance gains on vision and language benchmarks, consistently outperforming naive task vector addition and few-shot fine-tuning.

Problem

Research questions and friction points this paper is trying to address.

Transferring task vectors across different pre-trained model versions

Addressing parameter space misalignment in model adaptation

Enabling knowledge reuse without full fine-tuning process

Innovation

Methods, ideas, or system contributions that make the work stand out.

GradFix method transfers task vectors using gradient signs

Masks source task vector with target model gradients

Achieves adaptation without additional fine-tuning steps

🔎 Similar Papers

No similar papers found.