π€ AI Summary
This work addresses the challenge of modeling the interlayer stochastic evolution of the empirical kernel \( G \) in finite-width pre-activation ResNets. It introduces effective field theory (EFT) into this setting for the first time, leveraging the exact conditional Gaussianity of residual increments under a \( G \)-only closure hierarchy to derive a stochastic recurrence relation. A systematic Gaussian approximation then yields an ordinary differential equation (ODE) system in the continuous-depth limit. The approach innovatively incorporates a multi-order description comprising the mean kernel \( K_0 \), kernel covariance \( V_4 \), and a \( 1/n \) correction term \( K_1 \), with approximation errors analyzed via diagrammatic expansion (specifically, one-loop tadpole diagrams). Results show that \( K_0 \) remains highly accurate across all depths, \( V_4 \) accumulates \( O(1) \) bias due to transport approximation error in the \( G \)-only framework, and \( K_1 \) exhibits systematic bias even at initialization owing to source-closure failure, thereby revealing a fundamental limitation of \( G \)-only state reduction.
π Abstract
In finite-width deep neural networks, the empirical kernel $G$ evolves stochastically across layers. We develop a collective kernel effective field theory (EFT) for pre-activation ResNets based on a $G$-only closure hierarchy and diagnose its finite validity window. Exploiting the exact conditional Gaussianity of residual increments, we derive an exact stochastic recursion for $G$. Applying Gaussian approximations systematically yields a continuous-depth ODE system for the mean kernel $K_0$, the kernel covariance $V_4$, and the $1/n$ mean correction $K_{1,\mathrm{EFT}}$, which emerges diagrammatically as a one-loop tadpole correction. Numerically, $K_0$ remains accurate at all depths. However, the $V_4$ equation residual accumulates to an $O(1)$ error at finite time, primarily driven by approximation errors in the $G$-only transport term. Furthermore, $K_{1,\mathrm{EFT}}$ fails due to the breakdown of the source closure, which exhibits a systematic mismatch even at initialization. These findings highlight the limitations of $G$-only state-space reduction and suggest extending the state space to incorporate the sigma-kernel.