SafeOR-Gym: A Benchmark Suite for Safe Reinforcement Learning Algorithms on Practical Operations Research Problems

📅 2025-06-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing safety-aware reinforcement learning (RL) benchmarks focus predominantly on robotic control and lack the structural complexity required for high-stakes operational domains—such as energy and manufacturing—where problems involve structured constraints, mixed-integer decisions, and hybrid discrete-continuous action spaces. Method: We introduce the first safety RL benchmark suite grounded in real-world operations research problems, comprising nine industrially realistic planning, scheduling, and control environments, all unified under the Constrained Markov Decision Process (CMDP) framework. We propose novel evaluation mechanisms: cost-based constraint violation modeling, long-horizon planning assessment, and hybrid-action-space evaluation; and leverage the OmniSafe framework for standardized constraint formulation, safe policy optimization, and cross-environment evaluation. Contribution/Results: We conduct systematic benchmarking of mainstream safe RL algorithms, exposing their practical performance limits. All environments and code are open-sourced to accelerate deployment of trustworthy decision-making AI in safety-critical domains.

Technology Category

Application Category

📝 Abstract
Most existing safe reinforcement learning (RL) benchmarks focus on robotics and control tasks, offering limited relevance to high-stakes domains that involve structured constraints, mixed-integer decisions, and industrial complexity. This gap hinders the advancement and deployment of safe RL in critical areas such as energy systems, manufacturing, and supply chains. To address this limitation, we present SafeOR-Gym, a benchmark suite of nine operations research (OR) environments tailored for safe RL under complex constraints. Each environment captures a realistic planning, scheduling, or control problems characterized by cost-based constraint violations, planning horizons, and hybrid discrete-continuous action spaces. The suite integrates seamlessly with the Constrained Markov Decision Process (CMDP) interface provided by OmniSafe. We evaluate several state-of-the-art safe RL algorithms across these environments, revealing a wide range of performance: while some tasks are tractable, others expose fundamental limitations in current approaches. SafeOR-Gym provides a challenging and practical testbed that aims to catalyze future research in safe RL for real-world decision-making problems. The SafeOR-Gym framework and all accompanying code are available at: https://github.com/li-group/SafeOR-Gym.
Problem

Research questions and friction points this paper is trying to address.

Lack of safe RL benchmarks for structured constraints and industrial complexity
Need for safe RL in energy, manufacturing, and supply chains
Current safe RL approaches struggle with hybrid action spaces and constraints
Innovation

Methods, ideas, or system contributions that make the work stand out.

SafeOR-Gym benchmark for safe RL
Integrates with CMDP interface
Hybrid discrete-continuous action spaces
🔎 Similar Papers
Asha Ramanujam
Asha Ramanujam
Ph.D. candidate, Purdue University
OptimizationProcess systems engineering
A
Adam Elyoumi
Davidson School of Chemical Engineering, Purdue University, West Lafayette, IN
H
Hao Chen
Davidson School of Chemical Engineering, Purdue University, West Lafayette, IN
Sai Madhukiran Kompalli
Sai Madhukiran Kompalli
Graduate Student, Purdue University
Mathematical ProgrammingExplainable OptimizationAI for Sequential Decision Problems
A
A. Ahluwalia
Davidson School of Chemical Engineering, Purdue University, West Lafayette, IN
Shraman Pal
Shraman Pal
PhD, Chemical Engineering, Purdue University
Deep LearningProcess Systems EngineeringOptimization
Dimitri J. Papageorgiou
Dimitri J. Papageorgiou
ExxonMobil Corporate Strategic Research
Energy & Power SystemsOptimizationOperations ResearchSupply ChainMachine Learning
C
Can Li
Davidson School of Chemical Engineering, Purdue University, West Lafayette, IN