Collective Kernel EFT for Pre-activation ResNets

📅 2026-04-17

📈 Citations: 0

✨ Influential: 0

career value

254K/year

🤖 AI Summary

This work addresses the challenge of modeling the interlayer stochastic evolution of the empirical kernel $ G $ in finite-width pre-activation ResNets. It introduces effective field theory (EFT) into this setting for the first time, leveraging the exact conditional Gaussianity of residual increments under a $ G $-only closure hierarchy to derive a stochastic recurrence relation. A systematic Gaussian approximation then yields an ordinary differential equation (ODE) system in the continuous-depth limit. The approach innovatively incorporates a multi-order description comprising the mean kernel $ K_0 $, kernel covariance $ V_4 $, and a $ 1/n $ correction term $ K_1 $, with approximation errors analyzed via diagrammatic expansion (specifically, one-loop tadpole diagrams). Results show that $ K_0 $ remains highly accurate across all depths, $ V_4 $ accumulates $ O(1) $ bias due to transport approximation error in the $ G $-only framework, and $ K_1 $ exhibits systematic bias even at initialization owing to source-closure failure, thereby revealing a fundamental limitation of $ G $-only state reduction.

Technology Category

Application Category

📝 Abstract

In finite-width deep neural networks, the empirical kernel $G$ evolves stochastically across layers. We develop a collective kernel effective field theory (EFT) for pre-activation ResNets based on a $G$-only closure hierarchy and diagnose its finite validity window. Exploiting the exact conditional Gaussianity of residual increments, we derive an exact stochastic recursion for $G$. Applying Gaussian approximations systematically yields a continuous-depth ODE system for the mean kernel $K_0$, the kernel covariance $V_4$, and the $1/n$ mean correction $K_{1,\mathrm{EFT}}$, which emerges diagrammatically as a one-loop tadpole correction. Numerically, $K_0$ remains accurate at all depths. However, the $V_4$ equation residual accumulates to an $O(1)$ error at finite time, primarily driven by approximation errors in the $G$-only transport term. Furthermore, $K_{1,\mathrm{EFT}}$ fails due to the breakdown of the source closure, which exhibits a systematic mismatch even at initialization. These findings highlight the limitations of $G$-only state-space reduction and suggest extending the state space to incorporate the sigma-kernel.

Problem

Research questions and friction points this paper is trying to address.

collective kernel

effective field theory

pre-activation ResNets

finite-width networks

kernel evolution

Innovation

Methods, ideas, or system contributions that make the work stand out.

Collective Kernel EFT

Pre-activation ResNets

Gaussian Approximation