🤖 AI Summary
Traditional Bayesian inference methods—such as Markov Chain Monte Carlo (MCMC) and variational inference (VI)—require re-running the entire procedure upon receiving new data, resulting in poor computational efficiency. To address this, we propose an amortized contextual Bayesian posterior estimation framework that employs a permutation-invariant Transformer–normalizing flow architecture to perform end-to-end posterior inference directly from input data sequences, eliminating iterative re-computation. Our work is the first to systematically compare reverse KL versus forward KL variational objectives in contextual Bayesian inference, revealing that reverse KL yields superior predictive performance. Crucially, our architecture explicitly enforces permutation invariance of the posterior with respect to observation order—a property intrinsic to Bayesian inference. Experiments demonstrate significant improvements over MCMC and VI across out-of-distribution generalization, model misspecification, and sim-to-real transfer tasks, achieving simultaneous gains in both predictive accuracy and computational efficiency.
📝 Abstract
Bayesian inference provides a natural way of incorporating prior beliefs and assigning a probability measure to the space of hypotheses. Current solutions rely on iterative routines like Markov Chain Monte Carlo (MCMC) sampling and Variational Inference (VI), which need to be re-run whenever new observations are available. Amortization, through conditional estimation, is a viable strategy to alleviate such difficulties and has been the guiding principle behind simulation-based inference, neural processes and in-context methods using pre-trained models. In this work, we conduct a thorough comparative analysis of amortized in-context Bayesian posterior estimation methods from the lens of different optimization objectives and architectural choices. Such methods train an amortized estimator to perform posterior parameter inference by conditioning on a set of data examples passed as context to a sequence model such as a transformer. In contrast to language models, we leverage permutation invariant architectures as the true posterior is invariant to the ordering of context examples. Our empirical study includes generalization to out-of-distribution tasks, cases where the assumed underlying model is misspecified, and transfer from simulated to real problems. Subsequently, it highlights the superiority of the reverse KL estimator for predictive problems, especially when combined with the transformer architecture and normalizing flows.