Understanding Generalization from Embedding Dimension and Distributional Convergence

📅 2026-01-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the puzzle of why over-parameterized deep neural networks generalize well despite defying classical parameter-count-based generalization theories. From the perspective of representation learning, the paper proposes a novel generalization analysis framework that eschews dependence on model size. It characterizes the geometric convergence of learned embeddings via the Wasserstein distance and quantifies the sensitivity of the prediction mapping through its Lipschitz constant, thereby deriving an embedding-dependent generalization bound. Theoretically, the bound reveals that generalization error is primarily governed by the embedding dimensionality. Empirical validation across diverse architectures and datasets demonstrates that this metric exhibits strong correlation with actual generalization performance and remains consistently predictive under varying conditions.

Technology Category

Application Category

📝 Abstract
Deep neural networks often generalize well despite heavy over-parameterization, challenging classical parameter-based analyses. We study generalization from a representation-centric perspective and analyze how the geometry of learned embeddings controls predictive performance for a fixed trained model. We show that population risk can be bounded by two factors: (i) the intrinsic dimension of the embedding distribution, which determines the convergence rate of empirical embedding distribution to the population distribution in Wasserstein distance, and (ii) the sensitivity of the downstream mapping from embeddings to predictions, characterized by Lipschitz constants. Together, these yield an embedding-dependent error bound that does not rely on parameter counts or hypothesis class complexity. At the final embedding layer, architectural sensitivity vanishes and the bound is dominated by embedding dimension, explaining its strong empirical correlation with generalization performance. Experiments across architectures and datasets validate the theory and demonstrate the utility of embedding-based diagnostics.
Problem

Research questions and friction points this paper is trying to address.

generalization
embedding dimension
distributional convergence
over-parameterization
representation learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

embedding dimension
distributional convergence
generalization bound
Wasserstein distance
Lipschitz sensitivity
🔎 Similar Papers
No similar papers found.
Junjie Yu
Junjie Yu
Southern University of Science and Technology
Deep LearningNeuroscience
Z
Zhuoli Ouyang
Department of Electronic and Electrical Engineering, Southern University of Science and Technology
Haotian Deng
Haotian Deng
ByteDance
Computer Networking
C
Chen Wei
Department of Biomedical Engineering, Southern University of Science and Technology
W
Wenxiao Ma
Department of Biomedical Engineering, Southern University of Science and Technology
J
Jianyu Zhang
Department of Biomedical Engineering, Southern University of Science and Technology
Z
Zihan Deng
Department of Biomedical Engineering, Southern University of Science and Technology
Q
Quanying Liu
Department of Biomedical Engineering, Southern University of Science and Technology