High-entropy Advantage in Neural Networks' Generalizability

📅 2025-03-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Conventional machine learning often neglects physical principles, leading to suboptimal generalization and limited theoretical grounding for optimization. Method: We model neural networks as one-dimensional non-interacting particle systems and introduce statistical mechanical entropy to characterize model states. Using the Wang–Landau algorithm, we construct entropy–generalization landscapes for million-parameter networks. Contribution/Results: We discover a pronounced “entropy advantage”: high-entropy solutions consistently outperform low-entropy minima found by SGD and other standard optimizers—by up to 2.3× in narrow networks—challenging the universality assumption of SGD. Across arithmetic reasoning, tabular data, image classification, and language modeling, high-entropy states yield average test accuracy gains of 3.2–7.8%. This work establishes a new physics-informed paradigm for optimizer design, grounded in statistical mechanics and providing both theoretical justification and empirical validation for entropy-driven optimization.

Technology Category

Application Category

📝 Abstract
While the 2024 Nobel Prize in Physics ignites a worldwide discussion on the origins of neural networks and their foundational links to physics, modern machine learning research predominantly focuses on computational and algorithmic advancements, overlooking a picture of physics. Here we introduce the concept of entropy into neural networks by reconceptualizing them as hypothetical physical systems where each parameter is a non-interacting 'particle' within a one-dimensional space. By employing a Wang-Landau algorithms, we construct the neural networks' (with up to 1 million parameters) entropy landscapes as functions of training loss and test accuracy (or loss) across four distinct machine learning tasks, including arithmetic question, real-world tabular data, image recognition, and language modeling. Our results reveal the existence of extit{entropy advantage}, where the high-entropy states generally outperform the states reached via classical training optimizer like stochastic gradient descent. We also find this advantage is more pronounced in narrower networks, indicating a need of different training optimizers tailored to different sizes of neural networks.
Problem

Research questions and friction points this paper is trying to address.

Introducing entropy concept into neural networks
Exploring high-entropy states' performance in neural networks
Investigating entropy advantage in different network sizes
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces entropy concept in neural networks
Uses Wang-Landau algorithm for entropy landscapes
High-entropy states outperform classical training optimizers
🔎 Similar Papers
No similar papers found.
Entao Yang
Entao Yang
AI Research Scientist @ Air Liquide | PhD from University of Pennsylvania
polymer physicspolymer nanocompositesmachine learning
X
Xiaotian Zhang
Department of Physics, City University of Hong Kong, Hong Kong, China
Yue Shang
Yue Shang
Drexel University
NLU/NLGMLinformation retrievalrelevancesemantic match
G
Ge Zhang
Department of Physics, City University of Hong Kong, Hong Kong, China