🤖 AI Summary
This work establishes the first rigorous theoretical guarantees for task vector editing in nonlinear Transformer models. Addressing the efficacy of task vector arithmetic—specifically task addition and negation—in model editing, the paper analyzes these operations within a conceptual binary classification framework, integrating nonconvex optimization theory and generalization error bounds. It characterizes how task alignment or conflict governs editing performance: proving that task addition generalizes across multitask learning, task negation is feasible for machine unlearning, and the choice of linear coefficients critically determines out-of-domain task generalization. The theoretical analysis accommodates both full-parameter and low-rank weight updates. Empirical validation on the Phi-1.5 (1.3B) model for machine unlearning confirms the robust generalization capability of task vector editing, thereby bridging theoretical foundations with practical applicability in large language model editing.
📝 Abstract
Task arithmetic refers to editing the pre-trained model by adding a weighted sum of task vectors, each of which is the weight update from the pre-trained model to fine-tuned models for certain tasks. This approach recently gained attention as a computationally efficient inference method for model editing, e.g., multi-task learning, forgetting, and out-of-domain generalization capabilities. However, the theoretical understanding of why task vectors can execute various conceptual operations remains limited, due to the highly non-convexity of training Transformer-based models. To the best of our knowledge, this paper provides the first theoretical characterization of the generalization guarantees of task vector methods on nonlinear Transformers. We consider a conceptual learning setting, where each task is a binary classification problem based on a discriminative pattern. We theoretically prove the effectiveness of task addition in simultaneously learning a set of irrelevant or aligned tasks, as well as the success of task negation in unlearning one task from irrelevant or contradictory tasks. Moreover, we prove the proper selection of linear coefficients for task arithmetic to achieve guaranteed generalization to out-of-domain tasks. All of our theoretical results hold for both dense-weight parameters and their low-rank approximations. Although established in a conceptual setting, our theoretical findings were validated on a practical machine unlearning task using the large language model Phi-1.5 (1.3B).