🤖 AI Summary
Exact marginal inference in discrete graphical models remains challenging on graphs with high treewidth or frustration, where accuracy and scalability are difficult to reconcile. This work proposes an autoregressive Graph Transformer that, for the first time, integrates the sequential structure of variable elimination into a graph neural network. By employing tensor-train compression to approximate intermediate factors, the model recovers exact elimination paths while incorporating a Dirichlet output layer and weighted conformal prediction to achieve calibrated, distribution-free coverage guarantees and theoretically bounded error propagation. The method achieves state-of-the-art performance across all benchmarks: it reduces mean absolute error (MAE) from 0.041 to 0.020 on standard instances and attains an MAE of 0.048 on frustrated spin-glass problems with N=500, whereas Belief Propagation completely diverges.
📝 Abstract
Marginal inference in discrete graphical models forces a choice between exactness and scalability: exact algorithms are intractable for high-treewidth graphs, while iterative approximations (Belief Propagation, variational methods) sacrifice convergence guarantees on frustrated topologies. We argue that this dichotomy stems from a mismatched inductive bias: iterative methods abandon the sequential elimination structure that makes exact inference correct. We introduce In-Context Graphical Inference (ICG-I), an autoregressive Graph Transformer that restores this structure by mimicking Variable Elimination with learned, Tensor- Train-compressed intermediate factors, paired with a Dirichlet output layer and Weighted Conformal Prediction for calibrated, distribution-free coverage guarantees under topological shift. We prove that TT compression errors propagate at most lincarly through the autoregressive chain, that the Dirichlet-Multinomial loss is a proper scoring rule, and that WCP maintains coverage with a quantifiable degradation under estimated density ratios. We conducted intensive experiments to evaluate ICG-I and achieved state-of-the-art performance across all benchmarks. ICG-I reduces MAE from 0.041 (best baseline) to 0.020 on standard instances and achieves 0.048 on N=500 frustrated spin glasses where BP diverges entirely.