On the Ability of Deep Networks to Learn Symmetries from Data: A Neural Kernel Theory

📅 2024-12-16
🏛️ arXiv.org
📈 Citations: 2
Influential: 0
📄 PDF
🤖 AI Summary
This paper investigates whether deep networks under standard supervised training can autonomously learn unencoded symmetries—such as rotation invariance—from partially observed cyclic group-symmetric data, under realistic class-level symmetry heterogeneity (where some classes exhibit full cyclic transformations while others only subsets). Method: We develop the first neural-kernel symmetry learning theory, grounded in infinite-width NTK analysis and group representation-theoretic Fourier analysis. Contribution/Results: The theory reveals that generalization to unseen symmetries hinges on the “overwhelming dominance” of local data structure over symmetry structure within the kernel-induced feature space, and yields a verifiable signal-to-noise criterion in the frequency domain. Our analysis precisely reproduces empirical failures of MLPs, CNNs, and ViTs on rotationally augmented MNIST subsets, and rigorously proves that conventional supervised training cannot acquire symmetries absent from the architectural prior.

Technology Category

Application Category

📝 Abstract
Symmetries (transformations by group actions) are present in many datasets, and leveraging them holds significant promise for improving predictions in machine learning. In this work, we aim to understand when and how deep networks can learn symmetries from data. We focus on a supervised classification paradigm where data symmetries are only partially observed during training: some classes include all transformations of a cyclic group, while others include only a subset. We ask: can deep networks generalize symmetry invariance to the partially sampled classes? In the infinite-width limit, where kernel analogies apply, we derive a neural kernel theory of symmetry learning to address this question. The group-cyclic nature of the dataset allows us to analyze the spectrum of neural kernels in the Fourier domain; here we find a simple characterization of the generalization error as a function of the interaction between class separation (signal) and class-orbit density (noise). We observe that generalization can only be successful when the local structure of the data prevails over its non-local, symmetric, structure, in the kernel space defined by the architecture. This occurs when (1) classes are sufficiently distinct and (2) class orbits are sufficiently dense. Our framework also applies to equivariant architectures (e.g., CNNs), and recovers their success in the special case where the architecture matches the inherent symmetry of the data. Empirically, our theory reproduces the generalization failure of finite-width networks (MLP, CNN, ViT) trained on partially observed versions of rotated-MNIST. We conclude that conventional networks trained with supervision lack a mechanism to learn symmetries that have not been explicitly embedded in their architecture a priori. Our framework could be extended to guide the design of architectures and training procedures able to learn symmetries from data.
Problem

Research questions and friction points this paper is trying to address.

Understanding when deep networks learn symmetries from data
Analyzing generalization error in partially observed symmetry datasets
Exploring limitations of conventional networks in learning implicit symmetries
Innovation

Methods, ideas, or system contributions that make the work stand out.

Neural kernel theory analyzes symmetry learning.
Fourier domain simplifies Gram matrix analysis.
Generalization depends on local data structure.
🔎 Similar Papers