Bootstrapped Model Predictive Control

📅 2025-03-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing model predictive control (MPC) methods suffer from inefficient policy learning in complex continuous-control tasks due to inaccurate value function estimation. This paper proposes a bootstrapped, co-optimization framework that synergistically integrates MPC and neural policy networks: MPC-generated expert trajectories guide policy network training via imitation learning, while the learned policy reciprocally refines MPC’s value-guided action planning. To enhance sample efficiency, we introduce a lazy reanalysis mechanism that amortizes trajectory evaluation overhead; additionally, model-based temporal-difference learning is incorporated to improve value estimation accuracy and training stability. To our knowledge, this is the first approach establishing a closed-loop, bidirectional optimization loop between MPC and neural policies. Empirical results on high-dimensional locomotion control demonstrate substantial improvements in data efficiency, asymptotic performance, and training robustness. Under identical computational budgets, our method achieves superior performance with significantly smaller models.

Technology Category

Application Category

📝 Abstract
Model Predictive Control (MPC) has been demonstrated to be effective in continuous control tasks. When a world model and a value function are available, planning a sequence of actions ahead of time leads to a better policy. Existing methods typically obtain the value function and the corresponding policy in a model-free manner. However, we find that such an approach struggles with complex tasks, resulting in poor policy learning and inaccurate value estimation. To address this problem, we leverage the strengths of MPC itself. In this work, we introduce Bootstrapped Model Predictive Control (BMPC), a novel algorithm that performs policy learning in a bootstrapped manner. BMPC learns a network policy by imitating an MPC expert, and in turn, uses this policy to guide the MPC process. Combined with model-based TD-learning, our policy learning yields better value estimation and further boosts the efficiency of MPC. We also introduce a lazy reanalyze mechanism, which enables computationally efficient imitation learning. Our method achieves superior performance over prior works on diverse continuous control tasks. In particular, on challenging high-dimensional locomotion tasks, BMPC significantly improves data efficiency while also enhancing asymptotic performance and training stability, with comparable training time and smaller network sizes. Code is available at https://github.com/wertyuilife2/bmpc.
Problem

Research questions and friction points this paper is trying to address.

Improves policy learning in complex control tasks
Enhances value estimation for Model Predictive Control
Boosts efficiency and stability in high-dimensional locomotion
Innovation

Methods, ideas, or system contributions that make the work stand out.

Bootstrapped MPC combines policy learning with MPC
Model-based TD-learning improves value estimation
Lazy reanalyze mechanism enables efficient imitation
🔎 Similar Papers
No similar papers found.
Y
Yuhang Wang
National Key Laboratory of Human-Machine Hybrid Augmented Intelligence, National Engineering Research Center for Visual Information and Application, Institute of Artificial Intelligence and Robotics, Xi’an Jiaotong University, Xi’an, China
H
Hanwei Guo
National Key Laboratory of Human-Machine Hybrid Augmented Intelligence, National Engineering Research Center for Visual Information and Application, Institute of Artificial Intelligence and Robotics, Xi’an Jiaotong University, Xi’an, China
Sizhe Wang
Sizhe Wang
Washington University in Saint Louis
LLM NLP
L
Long Qian
National Key Laboratory of Human-Machine Hybrid Augmented Intelligence, National Engineering Research Center for Visual Information and Application, Institute of Artificial Intelligence and Robotics, Xi’an Jiaotong University, Xi’an, China
X
Xuguang Lan
National Key Laboratory of Human-Machine Hybrid Augmented Intelligence, National Engineering Research Center for Visual Information and Application, Institute of Artificial Intelligence and Robotics, Xi’an Jiaotong University, Xi’an, China