🤖 AI Summary
To address the slow convergence and high experimental resource consumption in closed-loop sequential decision-making and control, this paper proposes Temporal-aware Bayesian Optimization (TBO), the first framework to explicitly incorporate temporal intermediate performance feedback—observed within a single experiment—into black-box optimization. Methodologically, TBO constructs a joint temporal probabilistic surrogate model to enable early performance prediction and introduces a theoretically grounded, probabilistic early-stopping criterion that adaptively terminates unpromising experiments. Theoretical analysis guarantees convergence, while empirical evaluation demonstrates substantial efficiency gains: in simulation, TBO achieves baseline performance using only ∼50% of the experimental budget; under identical resource constraints, it significantly outperforms conventional Bayesian optimization and reinforcement learning baselines in closed-loop control performance. These results validate TBO’s efficacy and practical applicability for resource-efficient sequential optimization.
📝 Abstract
Closed-loop performance of sequential decision making algorithms, such as model predictive control, depends strongly on the parameters of cost functions, models, and constraints. Bayesian optimization is a common approach to learning these parameters based on closed-loop experiments. However, traditional Bayesian optimization approaches treat the learning problem as a black box, ignoring valuable information and knowledge about the structure of the underlying problem, resulting in slow convergence and high experimental resource use. We propose a time-series-informed optimization framework that incorporates intermediate performance evaluations from early iterations of each experimental episode into the learning procedure. Additionally, probabilistic early stopping criteria are proposed to terminate unpromising experiments, significantly reducing experimental time. Simulation results show that our approach achieves baseline performance with approximately half the resources. Moreover, with the same resource budget, our approach outperforms the baseline in terms of final closed-loop performance, highlighting its efficiency in sequential decision making scenarios.