Time-Series-Informed Closed-loop Learning for Sequential Decision Making and Control

📅 2024-12-03

🏛️ arXiv.org

📈 Citations: 2

✨ Influential: 0

career value

177K/year

🤖 AI Summary

To address the slow convergence and high experimental resource consumption in closed-loop sequential decision-making and control, this paper proposes Temporal-aware Bayesian Optimization (TBO), the first framework to explicitly incorporate temporal intermediate performance feedback—observed within a single experiment—into black-box optimization. Methodologically, TBO constructs a joint temporal probabilistic surrogate model to enable early performance prediction and introduces a theoretically grounded, probabilistic early-stopping criterion that adaptively terminates unpromising experiments. Theoretical analysis guarantees convergence, while empirical evaluation demonstrates substantial efficiency gains: in simulation, TBO achieves baseline performance using only ∼50% of the experimental budget; under identical resource constraints, it significantly outperforms conventional Bayesian optimization and reinforcement learning baselines in closed-loop control performance. These results validate TBO’s efficacy and practical applicability for resource-efficient sequential optimization.

Technology Category

Application Category

📝 Abstract

Closed-loop performance of sequential decision making algorithms, such as model predictive control, depends strongly on the parameters of cost functions, models, and constraints. Bayesian optimization is a common approach to learning these parameters based on closed-loop experiments. However, traditional Bayesian optimization approaches treat the learning problem as a black box, ignoring valuable information and knowledge about the structure of the underlying problem, resulting in slow convergence and high experimental resource use. We propose a time-series-informed optimization framework that incorporates intermediate performance evaluations from early iterations of each experimental episode into the learning procedure. Additionally, probabilistic early stopping criteria are proposed to terminate unpromising experiments, significantly reducing experimental time. Simulation results show that our approach achieves baseline performance with approximately half the resources. Moreover, with the same resource budget, our approach outperforms the baseline in terms of final closed-loop performance, highlighting its efficiency in sequential decision making scenarios.

Problem

Research questions and friction points this paper is trying to address.

Optimizing controller parameters for efficient sequential decision making systems

Exploiting temporal structure in closed-loop trajectories for faster convergence

Reducing experimental resource usage through probabilistic early stopping criteria

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-fidelity Bayesian optimization aligns fidelity with time

Probabilistic early stopping terminates unpromising experiments early

Framework exploits temporal structure for efficient controller tuning

🔎 Similar Papers

No similar papers found.