Distributional simplicity bias and effective convexity in Energy Based Models

📅 2026-05-08

📈 Citations: 0

✨ Influential: 0

career value

213K/year

🤖 AI Summary

Training energy-based models is prone to challenges arising from non-convexity, including sensitivity to initialization, convergence to spurious local minima, and unstable gradients. This work addresses these issues by analyzing learning dynamics through the lens of effective models, combining generalized Ising formulations with Fourier expansions of the energy function and leveraging gradient flow theory. The analysis reveals two types of fixed points—data-consistent and spurious—and uncovers a hierarchical learning mechanism wherein the model preferentially captures low-order interactions first. Introducing the notion of “effective convexity,” the study explains the implicit simplicity bias in learned distributions: perturbations near data-consistent fixed points are either stable or neutral, with neutral directions preserving the effective model structure. This theoretical framework elucidates why low-order inconsistent fixed points are rarely observed in practice and provides a mechanistic understanding of training stability.

📝 Abstract

Energy-based learning is a powerful framework for generative modelling, but its training is inherently non-convex, leading potentially to sensitivity to initialisation, poor local optima, and unstable gradient dynamics. We present a dynamical analysis of energy-based learning through the lens of the effective model, which can be interpreted as either a generalised Ising model with higher-order interactions or the Fourier expansion of the energy. Under sufficient expressivity, we show that the gradient flow induced by learning strictly positive distributions over binary variables admits two types of fixed points: data-consistent points, which exactly reproduce the target distribution, and spurious points, which satisfy stationarity without matching the target distribution. Around data-consistent points, we show that perturbations are either stable or neutral, with neutral directions leaving the effective model invariant. Finally, we show that gradient dynamics induce a hierarchy in which lower-order interactions are learned before higher-order ones. This provides a mechanistic explanation for the distributional simplicity bias and clarifies why fixed points that are not data-consistent at low orders are not observed in practice.

Problem

Research questions and friction points this paper is trying to address.

energy-based models

non-convex optimization

spurious fixed points

distributional simplicity bias

gradient dynamics

Innovation

Methods, ideas, or system contributions that make the work stand out.

energy-based models

simplicity bias

effective convexity