🤖 AI Summary
This work addresses the fundamental problem of whether layer outputs in neural networks are statistically sufficient for the target variable—i.e., whether they preserve the conditional distribution of inputs given the target. To this end, we propose a graph-variable modeling framework: each network layer is formalized as a transformation over a graph structure, with neurons interpreted as pairwise functions between inputs and anchor nodes. Theoretically, we prove that layer outputs asymptotically achieve statistical sufficiency in the infinite-width limit; for finite-width networks, we derive sufficiency preservation conditions under a region-separation assumption. Our framework unifies standard architectures—including fully connected and convolutional layers—as well as common activation functions (e.g., ReLU, sigmoid). This is the first study to jointly leverage graph-structural modeling and statistical sufficiency to establish a theoretical foundation for information retention in deep networks. Empirical validation confirms sufficiency attainment both in wide-network limits and practical architectures, offering a novel paradigm for analyzing information flow in deep learning.
📝 Abstract
This paper analyzes neural networks through graph variables and statistical sufficiency. We interpret neural network layers as graph-based transformations, where neurons act as pairwise functions between inputs and learned anchor points. Within this formulation, we establish conditions under which layer outputs are sufficient for the layer inputs, that is, each layer preserves the conditional distribution of the target variable given the input variable. Under dense anchor point assumptions, we prove that asymptotic sufficiency holds in the infinite-width limit and is preserved throughout training. To align more closely with practical architectures, we further show that sufficiency can be achieved with finite-width networks by assuming region-separated input distributions and constructing appropriate anchor points. Our framework covers fully connected layers, general pairwise functions, ReLU and sigmoid activations, and convolutional neural networks. This work bridges statistical sufficiency, graph-theoretic representations, and deep learning, providing a new statistical understanding of neural networks.