Intersection of Reinforcement Learning and Bayesian Optimization for Intelligent Control of Industrial Processes: A Safe MPC-based DPG using Multi-Objective BO

📅 2025-07-13

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

To address the slow convergence, suboptimal policies, and poor online safety adaptation inherent in MPC-RL methods for industrial process intelligent control, this paper proposes a novel safety-aware learning framework integrating Compatibility-aware Deterministic Policy Gradient (CDPG) with Multi-Objective Bayesian Optimization (MOBO). For the first time, CDPG-based gradient estimation is embedded into the MOBO pipeline, employing the Expected Hypervolume Improvement (EHVI) acquisition function to enable safe, sample-efficient tuning of MPC parameters under both observation noise and model uncertainty. Numerical experiments demonstrate that the method accelerates convergence by 2.3× over baseline approaches and improves sample efficiency—achieving comparable closed-loop performance with 41% fewer samples—while rigorously ensuring safety and robustness of the controlled system. This work establishes a verifiable, adaptive control paradigm for complex industrial processes.

Technology Category

Application Category

📝 Abstract

Model Predictive Control (MPC)-based Reinforcement Learning (RL) offers a structured and interpretable alternative to Deep Neural Network (DNN)-based RL methods, with lower computational complexity and greater transparency. However, standard MPC-RL approaches often suffer from slow convergence, suboptimal policy learning due to limited parameterization, and safety issues during online adaptation. To address these challenges, we propose a novel framework that integrates MPC-RL with Multi-Objective Bayesian Optimization (MOBO). The proposed MPC-RL-MOBO utilizes noisy evaluations of the RL stage cost and its gradient, estimated via a Compatible Deterministic Policy Gradient (CDPG) approach, and incorporates them into a MOBO algorithm using the Expected Hypervolume Improvement (EHVI) acquisition function. This fusion enables efficient and safe tuning of the MPC parameters to achieve improved closed-loop performance, even under model imperfections. A numerical example demonstrates the effectiveness of the proposed approach in achieving sample-efficient, stable, and high-performance learning for control systems.

Problem

Research questions and friction points this paper is trying to address.

Slow convergence in standard MPC-RL approaches

Suboptimal policy learning due to limited parameterization

Safety issues during online adaptation

Innovation

Methods, ideas, or system contributions that make the work stand out.

MPC-RL with Multi-Objective Bayesian Optimization

Safe tuning using EHVI acquisition function

Compatible Deterministic Policy Gradient approach

🔎 Similar Papers

Balance Reward and Safety Optimization for Safe Reinforcement Learning: A Perspective of Gradient Manipulation