π€ AI Summary
Existing methods struggle to effectively characterize the structural relationships among attention heads in Transformers. This work proposes a principal-angle-based projection kernel to measure the similarity between subspaces spanned by attention head weight matrices, leveraging this metric to construct an attention head relational graph that reveals the modelβs internal structure. To the best of our knowledge, this is the first application of projection kernels to quantify subspace affinity, accompanied by a novel framework for evaluating information content based on a reference distribution of random orthogonal subspaces. On the Indirect Object Identification (IOI) task, the proposed approach recovers known attention head interactions more clearly than existing metrics and identifies L4H7 in GPT2-small as an identity head functioning as an information hub.
π Abstract
Understanding relationships between attention heads is essential for interpreting the internal structure of Transformers, yet existing metrics do not capture this structure well. We focus on the subspaces spanned by attention-head weight matrices and quantify head-to-head relationships using the Projection Kernel (PK), a principal-angle-based measure of subspace similarity. Experiments show that PK reproduces known head-to-head interactions on the IOI task more clearly than prior metrics such as the Composition Score. We further introduce a framework to quantify the informativeness of PK distributions by comparing them with a reference distribution derived from random orthogonal subspaces. As an application, we analyze a directed graph constructed from PK and show that, in GPT2-small, L4H7 acts as a hub by functioning as an identity head.