🤖 AI Summary
MCMC suffers from slow convergence near critical phase transitions and in complex energy landscapes; conventional parallel tempering (PT) remains reliant on inefficient local spin updates. This paper proposes Transformer-PT: the first PT framework integrating a learnable, global spin-configuration generative model—trained end-to-end to produce high-quality candidate states that generalize across temperatures and problem instances—and employs Metropolis-based global transitions. By deeply unifying deep generative modeling with classical MCMC, Transformer-PT significantly accelerates equilibration. Experiments demonstrate exact reproduction of magnetic susceptibility and free-energy curves for the 2D Ising model; on 3D spin glasses and integer factorization tasks, it achieves several-fold speedup in convergence and markedly improved optimization success rates. The core contribution is the introduction of the first generalizable, globally updated PT paradigm, enabling cross-instance and cross-temperature inference.
📝 Abstract
Markov Chain Monte Carlo (MCMC) underlies both statistical physics and combinatorial optimization, but mixes slowly near critical points and in rough landscapes. Parallel Tempering (PT) improves mixing by swapping replicas across temperatures, yet each replica still relies on slow local updates to change its configuration. We introduce IsingFormer, a Transformer trained on equilibrium samples that can generate entire spin configurations resembling those from the target distribution. These uncorrelated samples are used as proposals for global moves within a Metropolis step in PT, complementing the usual single-spin flips. On 2D Ising models (sampling), IsingFormer reproduces magnetization and free-energy curves and generalizes to unseen temperatures, including the critical region. Injecting even a single proposal sharply reduces equilibration time, replacing thousands of local updates. On 3D spin glasses (optimization), PT enhanced with IsingFormer finds substantially lower-energy states, demonstrating how global moves accelerate search in rugged landscapes. Finally, applied to integer factorization encoded as Ising problems, IsingFormer trained on a limited set of semiprimes transfers successfully to unseen semiprimes, boosting success rates beyond the training distribution. Since factorization is a canonical hard benchmark, this ability to generalize across instances highlights the potential of learning proposals that move beyond single problems to entire families of instances. The IsingFormer demonstrates that Monte Carlo methods can be systematically accelerated by neural proposals that capture global structure, yielding faster sampling and stronger performance in combinatorial optimization.