🤖 AI Summary
This work addresses the high computational and storage costs of existing influence function–based data attribution methods, which hinder efficient identification of how individual training samples contribute to undesirable model behaviors such as harmful outputs. To overcome this limitation, the authors propose Influcoder, a novel approach that, for the first time, distills gradient-based influence ranking information from the decoder into the encoder through knowledge distillation and architectural optimization. This enables highly efficient and compact data attribution for large language models while preserving attribution accuracy. By significantly reducing both computation time and memory requirements, Influcoder offers a scalable pathway toward interpretable and debuggable large-scale models.
📝 Abstract
With the growth of LLMs' (Large Language Models) capabilities, there has been an increasing push to curate high quality datasets by filtering samples in the training data. In general, Data Attribution (DA) methods aim to estimate how individual samples in a training dataset can precondition a model to generate certain outputs. As an example, one might be interested in which samples in the data could be the source of toxic behavior after training the LLM. Many methods quantify this conditioning through the paradigm of influence functions. While methods of this family are effective in its function, they lack the necessary processing speed and storage compactness to be practically implemented on large datasets. We propose a method, Influcoder, as a quick and cost-effective approach to influence-based Data Attribution at scale.