🤖 AI Summary
To address the slow convergence, suboptimal policies, and poor online safety adaptation inherent in MPC-RL methods for industrial process intelligent control, this paper proposes a novel safety-aware learning framework integrating Compatibility-aware Deterministic Policy Gradient (CDPG) with Multi-Objective Bayesian Optimization (MOBO). For the first time, CDPG-based gradient estimation is embedded into the MOBO pipeline, employing the Expected Hypervolume Improvement (EHVI) acquisition function to enable safe, sample-efficient tuning of MPC parameters under both observation noise and model uncertainty. Numerical experiments demonstrate that the method accelerates convergence by 2.3× over baseline approaches and improves sample efficiency—achieving comparable closed-loop performance with 41% fewer samples—while rigorously ensuring safety and robustness of the controlled system. This work establishes a verifiable, adaptive control paradigm for complex industrial processes.
📝 Abstract
Model Predictive Control (MPC)-based Reinforcement Learning (RL) offers a structured and interpretable alternative to Deep Neural Network (DNN)-based RL methods, with lower computational complexity and greater transparency. However, standard MPC-RL approaches often suffer from slow convergence, suboptimal policy learning due to limited parameterization, and safety issues during online adaptation. To address these challenges, we propose a novel framework that integrates MPC-RL with Multi-Objective Bayesian Optimization (MOBO). The proposed MPC-RL-MOBO utilizes noisy evaluations of the RL stage cost and its gradient, estimated via a Compatible Deterministic Policy Gradient (CDPG) approach, and incorporates them into a MOBO algorithm using the Expected Hypervolume Improvement (EHVI) acquisition function. This fusion enables efficient and safe tuning of the MPC parameters to achieve improved closed-loop performance, even under model imperfections. A numerical example demonstrates the effectiveness of the proposed approach in achieving sample-efficient, stable, and high-performance learning for control systems.