Safe Continuous-time Multi-Agent Reinforcement Learning via Epigraph Form

📅 2026-02-18

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

This work addresses the challenge of incorporating safety constraints—such as collision penalties—into continuous-time multi-agent reinforcement learning (MARL), where such constraints introduce discontinuities that disrupt Hamilton–Jacobi–Bellman (HJB)-based learning frameworks. To overcome this, the authors propose a novel continuous-time constrained Markov decision process formulation that, for the first time in this domain, leverages the epigraph form to transform discrete safety constraints into a continuously differentiable optimization problem. Integrating physics-informed neural networks (PINNs), they design a new Actor-Critic algorithm that effectively mitigates the discontinuity issue and enables stable policy optimization. Experiments on continuous-time safe multi-particle systems and multi-agent MuJoCo benchmarks demonstrate that the proposed method significantly improves the smoothness of value functions and training stability, outperforming existing safe MARL baselines in both safety adherence and performance.

Technology Category

Application Category

📝 Abstract

Multi-agent reinforcement learning (MARL) has made significant progress in recent years, but most algorithms still rely on a discrete-time Markov Decision Process (MDP) with fixed decision intervals. This formulation is often ill-suited for complex multi-agent dynamics, particularly in high-frequency or irregular time-interval settings, leading to degraded performance and motivating the development of continuous-time MARL (CT-MARL). Existing CT-MARL methods are mainly built on Hamilton-Jacobi-Bellman (HJB) equations. However, they rarely account for safety constraints such as collision penalties, since these introduce discontinuities that make HJB-based learning difficult. To address this challenge, we propose a continuous-time constrained MDP (CT-CMDP) formulation and a novel MARL framework that transforms discrete MDPs into CT-CMDPs via an epigraph-based reformulation. We then solve this by proposing a novel physics-informed neural network (PINN)-based actor-critic method that enables stable and efficient optimization in continuous time. We evaluate our approach on continuous-time safe multi-particle environments (MPE) and safe multi-agent MuJoCo benchmarks. Results demonstrate smoother value approximations, more stable training, and improved performance over safe MARL baselines, validating the effectiveness and robustness of our method.

Problem

Research questions and friction points this paper is trying to address.

continuous-time multi-agent reinforcement learning

safety constraints

Hamilton-Jacobi-Bellman equation

collision avoidance

constrained MDP

Innovation

Methods, ideas, or system contributions that make the work stand out.

continuous-time MARL

epigraph form

constrained MDP