SPiDR: A Simple Approach for Zero-Shot Safety in Sim-to-Real Transfer

📅 2025-09-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address safety risks arising from environmental discrepancies in simulation-to-real (Sim-to-Real) transfer for reinforcement learning, this paper proposes a zero-shot safe transfer method. The core innovation integrates pessimistic optimization into the domain randomization framework: it explicitly models uncertainty in the simulation–reality gap and incorporates conservative estimates into safety constraints, thereby providing provably safe policies without real-world fine-tuning. The method is fully compatible with standard RL training pipelines and enables end-to-end, scalable learning of safety-aware policies. Evaluated on multiple simulation benchmarks and two physical robot platforms, our approach significantly improves cross-domain safety during transfer while preserving strong task performance. Key contributions include: (i) a theoretically grounded pessimistic domain randomization framework for Sim-to-Real safety; (ii) zero-shot safety guarantees under model uncertainty; and (iii) empirical validation demonstrating robust safety and performance across diverse robotic tasks.

Technology Category

Application Category

📝 Abstract
Safety remains a major concern for deploying reinforcement learning (RL) in real-world applications. Simulators provide safe, scalable training environments, but the inevitable sim-to-real gap introduces additional safety concerns, as policies must satisfy constraints in real-world conditions that differ from simulation. To address this challenge, robust safe RL techniques offer principled methods, but are often incompatible with standard scalable training pipelines. In contrast, domain randomization, a simple and popular sim-to-real technique, stands out as a promising alternative, although it often results in unsafe behaviors in practice. We present SPiDR, short for Sim-to-real via Pessimistic Domain Randomization -- a scalable algorithm with provable guarantees for safe sim-to-real transfer. SPiDR uses domain randomization to incorporate the uncertainty about the sim-to-real gap into the safety constraints, making it versatile and highly compatible with existing training pipelines. Through extensive experiments on sim-to-sim benchmarks and two distinct real-world robotic platforms, we demonstrate that SPiDR effectively ensures safety despite the sim-to-real gap while maintaining strong performance.
Problem

Research questions and friction points this paper is trying to address.

Addressing safety concerns in reinforcement learning deployment
Bridging the sim-to-real gap that introduces safety risks
Developing scalable safe RL methods compatible with training pipelines
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses pessimistic domain randomization for safety
Incorporates sim-to-real uncertainty into constraints
Maintains compatibility with existing training pipelines
🔎 Similar Papers
No similar papers found.