🤖 AI Summary
Existing safety-aware reinforcement learning (RL) benchmarks focus predominantly on robotic control and lack the structural complexity required for high-stakes operational domains—such as energy and manufacturing—where problems involve structured constraints, mixed-integer decisions, and hybrid discrete-continuous action spaces.
Method: We introduce the first safety RL benchmark suite grounded in real-world operations research problems, comprising nine industrially realistic planning, scheduling, and control environments, all unified under the Constrained Markov Decision Process (CMDP) framework. We propose novel evaluation mechanisms: cost-based constraint violation modeling, long-horizon planning assessment, and hybrid-action-space evaluation; and leverage the OmniSafe framework for standardized constraint formulation, safe policy optimization, and cross-environment evaluation.
Contribution/Results: We conduct systematic benchmarking of mainstream safe RL algorithms, exposing their practical performance limits. All environments and code are open-sourced to accelerate deployment of trustworthy decision-making AI in safety-critical domains.
📝 Abstract
Most existing safe reinforcement learning (RL) benchmarks focus on robotics and control tasks, offering limited relevance to high-stakes domains that involve structured constraints, mixed-integer decisions, and industrial complexity. This gap hinders the advancement and deployment of safe RL in critical areas such as energy systems, manufacturing, and supply chains. To address this limitation, we present SafeOR-Gym, a benchmark suite of nine operations research (OR) environments tailored for safe RL under complex constraints. Each environment captures a realistic planning, scheduling, or control problems characterized by cost-based constraint violations, planning horizons, and hybrid discrete-continuous action spaces. The suite integrates seamlessly with the Constrained Markov Decision Process (CMDP) interface provided by OmniSafe. We evaluate several state-of-the-art safe RL algorithms across these environments, revealing a wide range of performance: while some tasks are tractable, others expose fundamental limitations in current approaches. SafeOR-Gym provides a challenging and practical testbed that aims to catalyze future research in safe RL for real-world decision-making problems. The SafeOR-Gym framework and all accompanying code are available at: https://github.com/li-group/SafeOR-Gym.