Understanding Generalization from Embedding Dimension and Distributional Convergence

📅 2026-01-30

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the puzzle of why over-parameterized deep neural networks generalize well despite defying classical parameter-count-based generalization theories. From the perspective of representation learning, the paper proposes a novel generalization analysis framework that eschews dependence on model size. It characterizes the geometric convergence of learned embeddings via the Wasserstein distance and quantifies the sensitivity of the prediction mapping through its Lipschitz constant, thereby deriving an embedding-dependent generalization bound. Theoretically, the bound reveals that generalization error is primarily governed by the embedding dimensionality. Empirical validation across diverse architectures and datasets demonstrates that this metric exhibits strong correlation with actual generalization performance and remains consistently predictive under varying conditions.

Technology Category

Application Category

📝 Abstract

Deep neural networks often generalize well despite heavy over-parameterization, challenging classical parameter-based analyses. We study generalization from a representation-centric perspective and analyze how the geometry of learned embeddings controls predictive performance for a fixed trained model. We show that population risk can be bounded by two factors: (i) the intrinsic dimension of the embedding distribution, which determines the convergence rate of empirical embedding distribution to the population distribution in Wasserstein distance, and (ii) the sensitivity of the downstream mapping from embeddings to predictions, characterized by Lipschitz constants. Together, these yield an embedding-dependent error bound that does not rely on parameter counts or hypothesis class complexity. At the final embedding layer, architectural sensitivity vanishes and the bound is dominated by embedding dimension, explaining its strong empirical correlation with generalization performance. Experiments across architectures and datasets validate the theory and demonstrate the utility of embedding-based diagnostics.

Problem

Research questions and friction points this paper is trying to address.

generalization

embedding dimension

distributional convergence

over-parameterization

representation learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

embedding dimension

distributional convergence

generalization bound

Wasserstein distance

Lipschitz sensitivity

🔎 Similar Papers

No similar papers found.

Authors to Follow