🤖 AI Summary
This work addresses the lack of safety guarantees in offline multi-agent reinforcement learning by proposing a novel approach that integrates neural individual control barrier functions into a diffusion model. This integration enables the generation of trajectories satisfying safety constraints without requiring online environment interaction, while an inverse dynamics model is employed to recover feasible control policies. To the best of our knowledge, this is the first method to combine control barrier functions with diffusion models for safe policy learning in offline multi-agent settings. Experimental results demonstrate that the proposed framework significantly enhances safety across multiple benchmark tasks while maintaining cumulative reward performance comparable to state-of-the-art methods.
📝 Abstract
Offline reinforcement learning allows control policies to be learned directly from data without online interaction, making it suitable for safety-critical tasks. Recent studies have applied diffusion models to offline reinforcement learning to leverage their strong capacity for modeling complex data distributions. However, existing approaches primarily focus on single-agent settings, leaving the safety challenges in multi-agent environments largely unexplored. In this work, we propose a safe offline multi-agent reinforcement learning algorithm that embeds neural individual control barrier functions into the diffusion model to enhance safety during trajectory generation, with control policies recovered through inverse dynamics. We evaluate our algorithm across diverse benchmarks, demonstrating substantial safety improvements while maintaining competitive rewards.