🤖 AI Summary
Function vectors (FVs) are employed in in-context learning to steer large language models, yet critical design choices in their construction—specifically attention head selection and steering mechanisms—have not been systematically investigated. This work proposes a gradient attribution method based on Layer-wise Relevance Propagation (LRP) to efficiently identify the attention heads most contributive to a given task and introduces a distributed steering strategy that replaces conventional aggregation approaches. By preserving steering efficiency while substantially improving model accuracy on instruction-following tasks, the proposed method offers a novel pathway for optimizing function vectors.
📝 Abstract
Function vectors (FVs) are task representations elicited during in-context learning that can be used to steer Large Language Models (LLMs). However, design choices in their formulation remain underexplored. In this work, we study the impact of varying FV definitions for instructions along two degrees of freedom: attention head selection and steering. For head selection, using gradient-based attributions with Layer-wise Relevance Propagation (LRP) substantially improves efficiency as well as accuracy. For FV steering, applying it in a distributed manner yields a higher accuracy compared to simple aggregation. Our code is publicly available.