Skill-based Safe Reinforcement Learning with Risk Planning

📅 2025-05-02

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

To address the challenge of constraining high-risk behaviors in safety-critical real-world reinforcement learning (Safe RL), this paper proposes the Safety-Skilled Planning (SSkP) framework. First, a skill-level risk predictor is constructed using offline demonstration data and Positive-Unlabeled (PU) learning, enabling risk modeling without explicit safety labels. Second, an adaptive, risk-aware online planning mechanism is designed, integrating risk assessment directly into skill selection and execution. Finally, policy optimization jointly maximizes both task performance and safety. Evaluated across multiple robotic simulation benchmarks, SSkP significantly outperforms state-of-the-art Safe RL methods: average task success rate improves by 12.7%, and safety violation rate decreases by 43.5%. To our knowledge, this is the first work to achieve skill-level, risk-controllable planning grounded in PU learning.

Technology Category

Application Category

📝 Abstract

Safe Reinforcement Learning (Safe RL) aims to ensure safety when an RL agent conducts learning by interacting with real-world environments where improper actions can induce high costs or lead to severe consequences. In this paper, we propose a novel Safe Skill Planning (SSkP) approach to enhance effective safe RL by exploiting auxiliary offline demonstration data. SSkP involves a two-stage process. First, we employ PU learning to learn a skill risk predictor from the offline demonstration data. Then, based on the learned skill risk predictor, we develop a novel risk planning process to enhance online safe RL and learn a risk-averse safe policy efficiently through interactions with the online RL environment, while simultaneously adapting the skill risk predictor to the environment. We conduct experiments in several benchmark robotic simulation environments. The experimental results demonstrate that the proposed approach consistently outperforms previous state-of-the-art safe RL methods.

Problem

Research questions and friction points this paper is trying to address.

Ensuring RL agent safety in real-world interactions

Learning skill risk predictor from offline data

Enhancing online safe RL with risk planning

Innovation

Methods, ideas, or system contributions that make the work stand out.

PU learning for skill risk prediction

Risk planning enhances online safe RL

Adapts risk predictor to environment dynamically

🔎 Similar Papers

Safe Reinforcement Learning in Black-Box Environments via Adaptive Shielding