Rank Collapse, Fixed Points, and the Renormalization Group Structure of MLP Residual Networks

📅 2026-06-08

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the long-standing lack of quantitative validation for the analogy between deep neural network forward propagation and renormalization group (RG) flow, particularly the absence of measurable RG order parameters and empirical evidence under controlled inputs. Training pure MLP residual networks on synthetic Markov chains with known spectral properties for masked prediction, the study proposes effective rank as an RG order parameter and combines positional representation tracking with inter-layer kernel drift analysis to quantitatively characterize representational evolution with depth. The findings reveal that effective rank monotonically collapses with depth, but significantly only for inputs with short correlation lengths; layer-wise changes concentrate in a few transition layers, while others converge to fixed-point plateaus. These results demonstrate that MLP residual networks perform input-spectrum-guided selective coarse-graining, offering the first empirical and metric framework substantiating the RG–deep learning analogy.

📝 Abstract

The analogy between deep neural network forward passes and renormalization group (RG) flows has been repeatedly noted in the literature, but existing treatments remain qualitative: depth is described as a coarse-graining scale, attention is likened to a partition function, and representations are said to flow toward fixed points. No existing work has defined a measurable RG order parameter, tested it under controlled variation of the input distribution, or made quantitative predictions that are empirically verified. We study the simplest architecture for which the analogy is tractable: a pure MLP residual stack trained on masked token prediction over synthetic Markov chain sequences with known spectral properties. We report three findings. (i) The effective rank of the residual stream decreases monotonically with depth after training, consistent with progressive integration of irrelevant degrees of freedom. (ii) This rank collapse is selective: it occurs for chains with short correlation length approximately 1 but is absent for chains with long correlation length approximately 7, measured at the position level to control for mean-pooling artifacts. The network preserves exactly the degrees of freedom relevant to the prediction task, the content of the RG relevance criterion. (iii) Inter-layer kernel drift is concentrated at one or two specific transitions, with the remainder of the network near a fixed point, consistent with a discrete fixed-point plateau. Together these findings constitute the first quantitative, position-level evidence that MLP residual networks implement a selective coarse-graining procedure governed by the spectral structure of the input distribution.

Problem

Research questions and friction points this paper is trying to address.

rank collapse

fixed points

renormalization group

MLP residual networks

coarse-graining

Innovation

Methods, ideas, or system contributions that make the work stand out.

rank collapse

renormalization group

residual networks