🤖 AI Summary
This work addresses the challenge of effectively integrating unstructured priors (e.g., textual features) and quantifying uncertainty for exploration in online decision-making. We propose a novel autoregressive sequence modeling paradigm wherein exploration is formulated as a generative task—predicting missing future outcomes—replacing conventional parameter sampling and obviating explicit prior/posterior updates. Our approach introduces in-context learning for dynamic adaptation and establishes a meta-bandit framework that jointly models textual context and decision logic. We theoretically prove that offline predictive performance guarantees online decision reliability. Empirical evaluation on semi-synthetic news recommendation tasks demonstrates substantial reduction in exploration cost while improving text-aware recommendation accuracy. Key contributions include: (i) a generative uncertainty-aware exploration mechanism grounded in sequence prediction; (ii) the first integration of in-context learning into bandit-style online decision-making; and (iii) a unified meta-bandit architecture for end-to-end processing of heterogeneous textual and sequential decision signals.
📝 Abstract
We pose uncertainty quantification and exploration in online decision-making as a problem of training and generation from an autoregressive sequence model, an area experiencing rapid innovation. Our approach rests on viewing uncertainty as arising from missing future outcomes that would be revealed through appropriate action choices, rather than from unobservable latent parameters of the environment. This reformulation aligns naturally with modern machine learning capabilities: we can i) train generative models through next-outcome prediction rather than fit explicit priors, ii) assess uncertainty through autoregressive generation rather than parameter sampling, and iii) adapt to new information through in-context learning rather than explicit posterior updating. To showcase these ideas, we formulate a challenging meta-bandit problem where effective performance requires leveraging unstructured prior information (like text features) while exploring judiciously to resolve key remaining uncertainties. We validate our approach through both theory and experiments. Our theory establishes a reduction, showing success at offline next-outcome prediction translates to reliable online uncertainty quantification and decision-making, even with strategically collected data. Semi-synthetic experiments show our insights bear out in a news-article recommendation task, where article text can be leveraged to minimize exploration.