HOFLON: Hybrid Offline Learning and Online Optimization for Process Start-Up and Grade-Transition Control

📅 2025-10-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Startup and grade transitions in continuous process plants heavily rely on expert knowledge, posing significant challenges for knowledge retention and transfer. Existing offline reinforcement learning (RL) methods suffer from distributional shift and value overestimation, limiting generalization beyond the support of historical data. To address this, we propose a hybrid offline learning and online optimization framework: first, a long-horizon Q-critic model is trained on historical operational logs to predict cumulative rewards; second, implicit manifold constraints—enforcing process feasibility—are integrated with a Q-critic-guided one-step online optimization step to safely extrapolate beyond the data coverage. Evaluated on two industrial case studies—polymerization reactor startup and paper machine grade change—the method significantly outperforms state-of-the-art offline RL baselines (e.g., IQL) and surpasses the cumulative reward achieved by historical expert operations.

Technology Category

Application Category

📝 Abstract
Start-ups and product grade-changes are critical steps in continuous-process plant operation, because any misstep immediately affects product quality and drives operational losses. These transitions have long relied on manual operation by a handful of expert operators, but the progressive retirement of that workforce is leaving plant owners without the tacit know-how needed to execute them consistently. In the absence of a process model, offline reinforcement learning (RL) promises to capture and even surpass human expertise by mining historical start-up and grade-change logs, yet standard offline RL struggles with distribution shift and value-overestimation whenever a learned policy ventures outside the data envelope. We introduce HOFLON (Hybrid Offline Learning + Online Optimization) to overcome those limitations. Offline, HOFLON learns (i) a latent data manifold that represents the feasible region spanned by past transitions and (ii) a long-horizon Q-critic that predicts the cumulative reward from state-action pairs. Online, it solves a one-step optimization problem that maximizes the Q-critic while penalizing deviations from the learned manifold and excessive rates of change in the manipulated variables. We test HOFLON on two industrial case studies: a polymerization reactor start-up and a paper-machine grade-change problem, and benchmark it against Implicit Q-Learning (IQL), a leading offline-RL algorithm. In both plants HOFLON not only surpasses IQL but also delivers, on average, better cumulative rewards than the best start-up or grade-change observed in the historical data, demonstrating its potential to automate transition operations beyond current expert capability.
Problem

Research questions and friction points this paper is trying to address.

Automating process start-ups and product grade-transitions in industrial plants
Overcoming limitations of offline reinforcement learning for industrial control
Capturing expert operational knowledge despite workforce retirement challenges
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid offline learning with online optimization framework
Learns latent data manifold and long-horizon Q-critic
Online optimization maximizes Q-critic with deviation penalties
A
Alex Durkin
Department of Chemical Engineering, Imperial College London, SW7 2AZ, UK
J
Jasper Stolte
Shell Information Technology International BV, 1031 HW Amsterdam, NL
Mehmet Mercangöz
Mehmet Mercangöz
ABB Reader in Autonomous Industrial Systems at Imperial College London
Control SystemsRenewable EnergyEnergy StorageMachine LearningProcess Systems Engineering