🤖 AI Summary
To address training instability in Energy-Based Models (EBMs) arising from reliance on a one-sided variational lower bound, this paper proposes a bidirectional bound optimization framework. Under a minimax paradigm, it simultaneously maximizes a theoretical lower bound—jointly characterized by the generator’s Jacobian singular values and mutual information—and minimizes an upper bound formulated via gradient penalty and diffusion process modeling. This work introduces, for the first time, a systematic bidirectional constraint mechanism, overcoming the instability bottleneck inherent in conventional EBM training that depends solely on lower-bound optimization. The method integrates variational inference, adversarial training, Jacobian spectral analysis, mutual information estimation, and diffusion modeling. It significantly improves training stability, log-likelihood estimation accuracy, and sample generation quality. Extensive experiments on multiple benchmark datasets validate the complementarity and effectiveness of the proposed bidirectional bounds.
📝 Abstract
Energy-based models (EBMs) estimate unnormalized densities in an elegant framework, but they are generally difficult to train. Recent work has linked EBMs to generative adversarial networks, by noting that they can be trained through a minimax game using a variational lower bound. To avoid the instabilities caused by minimizing a lower bound, we propose to instead work with bidirectional bounds, meaning that we maximize a lower bound and minimize an upper bound when training the EBM. We investigate four different bounds on the log-likelihood derived from different perspectives. We derive lower bounds based on the singular values of the generator Jacobian and on mutual information. To upper bound the negative log-likelihood, we consider a gradient penalty-like bound, as well as one based on diffusion processes. In all cases, we provide algorithms for evaluating the bounds. We compare the different bounds to investigate, the pros and cons of the different approaches. Finally, we demonstrate that the use of bidirectional bounds stabilizes EBM training and yields high-quality density estimation and sample generation.