SafeOR-Gym: A Benchmark Suite for Safe Reinforcement Learning Algorithms on Practical Operations Research Problems

📅 2025-06-02

📈 Citations: 0

✨ Influential: 0

career value

169K/year

🤖 AI Summary

Existing safety-aware reinforcement learning (RL) benchmarks focus predominantly on robotic control and lack the structural complexity required for high-stakes operational domains—such as energy and manufacturing—where problems involve structured constraints, mixed-integer decisions, and hybrid discrete-continuous action spaces. Method: We introduce the first safety RL benchmark suite grounded in real-world operations research problems, comprising nine industrially realistic planning, scheduling, and control environments, all unified under the Constrained Markov Decision Process (CMDP) framework. We propose novel evaluation mechanisms: cost-based constraint violation modeling, long-horizon planning assessment, and hybrid-action-space evaluation; and leverage the OmniSafe framework for standardized constraint formulation, safe policy optimization, and cross-environment evaluation. Contribution/Results: We conduct systematic benchmarking of mainstream safe RL algorithms, exposing their practical performance limits. All environments and code are open-sourced to accelerate deployment of trustworthy decision-making AI in safety-critical domains.

Technology Category

Application Category

📝 Abstract

Most existing safe reinforcement learning (RL) benchmarks focus on robotics and control tasks, offering limited relevance to high-stakes domains that involve structured constraints, mixed-integer decisions, and industrial complexity. This gap hinders the advancement and deployment of safe RL in critical areas such as energy systems, manufacturing, and supply chains. To address this limitation, we present SafeOR-Gym, a benchmark suite of nine operations research (OR) environments tailored for safe RL under complex constraints. Each environment captures a realistic planning, scheduling, or control problems characterized by cost-based constraint violations, planning horizons, and hybrid discrete-continuous action spaces. The suite integrates seamlessly with the Constrained Markov Decision Process (CMDP) interface provided by OmniSafe. We evaluate several state-of-the-art safe RL algorithms across these environments, revealing a wide range of performance: while some tasks are tractable, others expose fundamental limitations in current approaches. SafeOR-Gym provides a challenging and practical testbed that aims to catalyze future research in safe RL for real-world decision-making problems. The SafeOR-Gym framework and all accompanying code are available at: https://github.com/li-group/SafeOR-Gym.

Problem

Research questions and friction points this paper is trying to address.

Lack of safe RL benchmarks for structured constraints and industrial complexity

Need for safe RL in energy, manufacturing, and supply chains

Current safe RL approaches struggle with hybrid action spaces and constraints

Innovation

Methods, ideas, or system contributions that make the work stand out.

SafeOR-Gym benchmark for safe RL

Integrates with CMDP interface

Hybrid discrete-continuous action spaces

🔎 Similar Papers

Balance Reward and Safety Optimization for Safe Reinforcement Learning: A Perspective of Gradient Manipulation