Measuring Affinity between Attention-Head Weight Subspaces via the Projection Kernel

📅 2026-01-15

📈 Citations: 0

✨ Influential: 0

career value

249K/year

🤖 AI Summary

Existing methods struggle to effectively characterize the structural relationships among attention heads in Transformers. This work proposes a principal-angle-based projection kernel to measure the similarity between subspaces spanned by attention head weight matrices, leveraging this metric to construct an attention head relational graph that reveals the model’s internal structure. To the best of our knowledge, this is the first application of projection kernels to quantify subspace affinity, accompanied by a novel framework for evaluating information content based on a reference distribution of random orthogonal subspaces. On the Indirect Object Identification (IOI) task, the proposed approach recovers known attention head interactions more clearly than existing metrics and identifies L4H7 in GPT2-small as an identity head functioning as an information hub.

Technology Category

Application Category

📝 Abstract

Understanding relationships between attention heads is essential for interpreting the internal structure of Transformers, yet existing metrics do not capture this structure well. We focus on the subspaces spanned by attention-head weight matrices and quantify head-to-head relationships using the Projection Kernel (PK), a principal-angle-based measure of subspace similarity. Experiments show that PK reproduces known head-to-head interactions on the IOI task more clearly than prior metrics such as the Composition Score. We further introduce a framework to quantify the informativeness of PK distributions by comparing them with a reference distribution derived from random orthogonal subspaces. As an application, we analyze a directed graph constructed from PK and show that, in GPT2-small, L4H7 acts as a hub by functioning as an identity head.

Problem

Research questions and friction points this paper is trying to address.

attention heads

subspace similarity

Transformer interpretation

Projection Kernel

head-to-head relationships

Innovation

Methods, ideas, or system contributions that make the work stand out.

Projection Kernel

attention head subspace

principal angles