Bridging the Sim-to-Real Gap in Reinforcement Learning-Based Industrial Dispatching through Execution Semantics

📅 2026-05-27

📈 Citations: 0

✨ Influential: 0

career value

177K/year

🤖 AI Summary

This work addresses the challenges in industrial scheduling where asynchronous event streams often lead to inconsistent decision states, ambiguous action validity, and difficulties in attributing execution errors in reinforcement learning policies. To resolve these issues, the paper proposes a policy-decoupled execution and measurement layer that bridges the policy and the execution environment. By constructing valid decision snapshots, defining standardized execution contracts, and recording multidimensional execution deviations, the approach structurally formalizes execution semantics for the first time. This enables observable and attributable deployment discrepancies between simulation and reality, transforming ambiguous execution failures into type-labeled supervisory signals. Experimental results demonstrate that the framework consistently enhances diagnostic capability across varying observation delays, significantly reducing avoidable errors under low-latency conditions and providing structured supervisory data for policy evaluation and optimization.

📝 Abstract

Event-driven scheduling policies are increasingly deployed in industrial environments, where decisions are made under asynchronous and partially observed system states. As a result, decision states are not temporally consistent, action admissibility is not explicitly defined, and the origin of execution errors remains ambiguous. These issues limit both reliability and interpretability. To address this gap, a policy-neutral execution and measurement layer is proposed to mediate between scheduling policies and the industrial execution environment. The layer constructs decision-valid snapshots from asynchronous event streams, defines a standardized execution contract with explicit action admissibility, and records outcomes as divergences between policy intent, transactional outcomes, physical execution, and human intervention. This enables a separation between decision semantics and execution behavior and makes deployment mismatch observable and structurally attributable. The proposed framework is evaluated using a discrete-event simulation. The results show analytical benefits across all observation lag regimes, as undifferentiated execution failures are transformed into structured, typed outcomes with full attribution coverage. Operational benefits are strongest under low observation lag, where avoidable execution errors can be prevented before commitment. Overall, the layer turns execution uncertainty into supervisory data for evaluation and policy refinement.

Problem

Research questions and friction points this paper is trying to address.

sim-to-real gap

reinforcement learning

industrial dispatching

execution semantics

event-driven scheduling

Innovation

Methods, ideas, or system contributions that make the work stand out.

execution semantics

sim-to-real gap

reinforcement learning