π€ AI Summary
This study addresses online learning of the Kalman filter for output and state estimation in partially observable linear dynamical systems with unknown system models. The authors propose a unified algorithmic framework based on online optimization, incorporating a stochastic querying mechanism to handle limited observability. Their theoretical analysis establishes, for the first time, that sublinear regret in state estimation is unattainable without queries, yet a βT regret bound becomes achievable with a finite number of stochastic queries, revealing a fundamental trade-off between query complexity and regret. The proposed algorithm attains a logarithmic regret bound (log T) for output estimation and a βT regret bound for state estimation. Numerical experiments corroborate the theoretical findings and demonstrate the algorithmβs empirical effectiveness.
π Abstract
In this paper, we study the problem of learning Kalman filtering with unknown system model in partially observed linear dynamical systems. We propose a unified algorithmic framework based on online optimization that can be used to solve both the output estimation and state estimation scenarios. By exploring the properties of the estimation error cost functions, such as conditionally strong convexity, we show that our algorithm achieves a $\log T$-regret in the horizon length $T$ for the output estimation scenario. More importantly, we tackle the more challenging scenario of learning Kalman filtering for state estimation, which is an open problem in the literature. We first characterize a fundamental limitation of the problem, demonstrating the impossibility of any algorithm to achieve sublinear regret in $T$. By further introducing a random query scheme into our algorithm, we show that a $\sqrt{T}$-regret is achievable when rendering the algorithm limited query access to more informative measurements of the system state in practice. Our algorithm and regret readily capture the trade-off between the number of queries and the achieved regret, and shed light on online learning problems with limited observations. We validate the performance of our algorithms using numerical examples.