🤖 AI Summary
This paper addresses offline safety-critical control for autonomous systems without access to system dynamics models, online interaction, or expert-designed control barrier functions (CBFs).
Method: We propose a model-free neural CBF learning framework featuring a value-guided recursive finite-difference barrier update mechanism, integrated with expectile regression and offline action-set constraints. The approach jointly learns neural CBFs, enforces expectile-based safety constraints, and synthesizes real-time quadratic-programming (QP)-based safe controllers—ensuring forward invariance of the learned safe set under distributional shift limitations.
Contribution/Results: Our method achieves provably safe, in-distribution CBF learning without modeling assumptions or expert supervision. Empirical evaluation across multiple tasks demonstrates substantial reductions in safety violations while preserving high task performance, enabling high-assurance deployment of offline-trained safety controllers.
📝 Abstract
Ensuring safety in autonomous systems requires controllers that satisfy hard, state-wise constraints without relying on online interaction. While existing Safe Offline RL methods typically enforce soft expected-cost constraints, they do not guarantee forward invariance. Conversely, Control Barrier Functions (CBFs) provide rigorous safety guarantees but usually depend on expert-designed barrier functions or full knowledge of the system dynamics. We introduce Value-Guided Offline Control Barrier Functions (V-OCBF), a framework that learns a neural CBF entirely from offline demonstrations. Unlike prior approaches, V-OCBF does not assume access to the dynamics model; instead, it derives a recursive finite-difference barrier update, enabling model-free learning of a barrier that propagates safety information over time. Moreover, V-OCBF incorporates an expectile-based objective that avoids querying the barrier on out-of-distribution actions and restricts updates to the dataset-supported action set. The learned barrier is then used with a Quadratic Program (QP) formulation to synthesize real-time safe control. Across multiple case studies, V-OCBF yields substantially fewer safety violations than baseline methods while maintaining strong task performance, highlighting its scalability for offline synthesis of safety-critical controllers without online interaction or hand-engineered barriers.