🤖 AI Summary
This work investigates how parameter symmetry and network expressivity jointly affect the generalization performance of neural networks in learning real-space renormalization group (RG) transformations, using the central limit theorem as a benchmark task.
Method: By analytically characterizing cumulant propagation dynamics in MLPs and GNNs—incorporating symmetry-constrained modeling and activation function tuning—we establish a fundamental trade-off between symmetry preservation and expressive capacity.
Contribution/Results: We propose and validate a recursive cumulant analysis framework that formally characterizes RG learning as a dynamic process of symmetry breaking followed by restoration. This framework not only elucidates the hierarchical information processing mechanism inherent to GNNs but also introduces a novel paradigm for the learning theory of symmetric neural networks. Crucially, we demonstrate that both excessive symmetry constraints and overly complex architectures degrade generalization, highlighting an optimal balance between invariance and expressivity.
📝 Abstract
Deep learning models have proven enormously successful at using multiple layers of representation to learn relevant features of structured data. Encoding physical symmetries into these models can improve performance on difficult tasks, and recent work has motivated the principle of parameter symmetry breaking and restoration as a unifying mechanism underlying their hierarchical learning dynamics. We evaluate the role of parameter symmetry and network expressivity in the generalisation behaviour of neural networks when learning a real-space renormalisation group (RG) transformation, using the central limit theorem (CLT) as a test case map. We consider simple multilayer perceptrons (MLPs) and graph neural networks (GNNs), and vary weight symmetries and activation functions across architectures. Our results reveal a competition between symmetry constraints and expressivity, with overly complex or overconstrained models generalising poorly. We analytically demonstrate this poor generalisation behaviour for certain constrained MLP architectures by recasting the CLT as a cumulant recursion relation and making use of an established framework to propagate cumulants through MLPs. We also empirically validate an extension of this framework from MLPs to GNNs, elucidating the internal information processing performed by these more complex models. These findings offer new insight into the learning dynamics of symmetric networks and their limitations in modelling structured physical transformations.