Understanding sparse autoencoder scaling in the presence of feature manifolds

📅 2025-09-02

📈 Citations: 0

✨ Influential: 0

career value

195K/year

🤖 AI Summary

Sparse autoencoders (SAEs) exhibit pathological “latent redundancy”—where the number of effectively activated features is significantly lower than the number of latent variables—particularly when neural activations lie on high-dimensional feature manifolds. Method: We introduce a capacity allocation modeling framework that formalizes SAEs’ linear decomposition and sparse representation as a dynamic allocation of representational capacity over the underlying manifold geometry. Contribution/Results: Our model successfully reproduces multi-stage scaling regimes and, for the first time, systematically reveals how manifold geometric properties—such as curvature and dimensional coupling—suppress feature activation density, thereby inducing latent redundancy. Empirical validation on activation data from large language models confirms the existence of this pathological regime. The framework provides both theoretical grounding and diagnostic tools for interpretable SAE modeling and architecture design, bridging manifold geometry with sparse coding behavior in neural representations.

Technology Category

Application Category

📝 Abstract

Sparse autoencoders (SAEs) model the activations of a neural network as linear combinations of sparsely occurring directions of variation (latents). The ability of SAEs to reconstruct activations follows scaling laws w.r.t. the number of latents. In this work, we adapt a capacity-allocation model from the neural scaling literature (Brill, 2024) to understand SAE scaling, and in particular, to understand how "feature manifolds" (multi-dimensional features) influence scaling behavior. Consistent with prior work, the model recovers distinct scaling regimes. Notably, in one regime, feature manifolds have the pathological effect of causing SAEs to learn far fewer features in data than there are latents in the SAE. We provide some preliminary discussion on whether or not SAEs are in this pathological regime in the wild.

Problem

Research questions and friction points this paper is trying to address.

Understanding scaling laws of sparse autoencoders

Investigating feature manifolds' impact on SAE scaling

Determining if SAEs learn fewer features than latents

Innovation

Methods, ideas, or system contributions that make the work stand out.

Adapted capacity-allocation model for scaling

Analyzed feature manifolds' pathological scaling effects

Evaluated SAEs' real-world regime presence

🔎 Similar Papers

Rank Reduction Autoencoders - Enhancing interpolation on nonlinear manifolds