Layers at Similar Depths Generate Similar Activations Across LLM Architectures

📅 2025-04-03

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

🤖 AI Summary

Whether layer-wise activations across large language models (LLMs) with heterogeneous architectures share alignable representational geometry remains unclear. Method: We propose a systematic framework based on nearest-neighbor graphs and high-dimensional geometric similarity measures to enable cross-model layer alignment and depth-normalized comparison across 24 open-source LLMs. Contribution/Results: We discover— for the first time—that activation spaces at matched normalized depths exhibit highly consistent local neighborhood structures, forming robust, layerwise-evolving geometric patterns. Crucially, normalized depth—not absolute layer index—predicts cross-model activation similarity: nearest-neighbor matching accuracy at equivalent depths significantly exceeds both random baselines and cross-depth controls. This reveals an implicit, shared computational pathway across diverse LLMs, establishing a geometric foundation for model alignment, knowledge transfer, and interpretability research.

Technology Category

Application Category

📝 Abstract

How do the latent spaces used by independently-trained LLMs relate to one another? We study the nearest neighbor relationships induced by activations at different layers of 24 open-weight LLMs, and find that they 1) tend to vary from layer to layer within a model, and 2) are approximately shared between corresponding layers of different models. Claim 2 shows that these nearest neighbor relationships are not arbitrary, as they are shared across models, but Claim 1 shows that they are not"obvious"either, as there is no single set of nearest neighbor relationships that is universally shared. Together, these suggest that LLMs generate a progression of activation geometries from layer to layer, but that this entire progression is largely shared between models, stretched and squeezed to fit into different architectures.

Problem

Research questions and friction points this paper is trying to address.

Compare latent spaces in independently-trained LLMs

Analyze nearest neighbor relationships across model layers

Examine shared activation geometries between different architectures

Innovation

Methods, ideas, or system contributions that make the work stand out.

Analyzes nearest neighbor relationships in LLM activations

Compares activation geometries across 24 open-weight LLMs

Identifies shared progression of activation patterns

🔎 Similar Papers

Sparse Autoencoders Reveal Universal Feature Spaces Across Large Language Models

2024-10-09arXiv.orgCitations: 4

Authors to Follow