π€ AI Summary
This paper addresses three key challenges of Bayesian filtering in sequential tasksβonline learning, one-step-ahead prediction, and contextual bandits: poor adaptability to non-stationary environments, insufficient robustness to model misspecification and outliers, and computational intractability in high-dimensional deep parameter spaces. Methodologically, it proposes a unified modular adaptive framework comprising: (1) a provably robust filter based on generalized Bayesian updating, ensuring statistical robustness; (2) an approximate second-order sequential optimizer (e.g., a K-FAC variant) leveraging overparameterization for scalability in high dimensions; and (3) the integration of online variational inference with deep neural network sequential learning. Theoretical analysis establishes convergence and robustness guarantees. Empirical evaluation demonstrates significant performance gains over state-of-the-art online learning and filtering methods under dynamic, high-dimensional, and model-misspecified regimes.
π Abstract
In this thesis, we introduce Bayesian filtering as a principled framework for tackling diverse sequential machine learning problems, including online (continual) learning, prequential (one-step-ahead) forecasting, and contextual bandits. To this end, this thesis addresses key challenges in applying Bayesian filtering to these problems: adaptivity to non-stationary environments, robustness to model misspecification and outliers, and scalability to the high-dimensional parameter space of deep neural networks. We develop novel tools within the Bayesian filtering framework to address each of these challenges, including: (i) a modular framework that enables the development adaptive approaches for online learning; (ii) a novel, provably robust filter with similar computational cost to standard filters, that employs Generalised Bayes; and (iii) a set of tools for sequentially updating model parameters using approximate second-order optimisation methods that exploit the overparametrisation of high-dimensional parametric models such as neural networks. Theoretical analysis and empirical results demonstrate the improved performance of our methods in dynamic, high-dimensional, and misspecified models.