CoRL-MPPI: Enhancing MPPI With Learnable Behaviours For Efficient And Provably-Safe Multi-Robot Collision Avoidance

📅 2025-11-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the suboptimal trajectory generation and poor scalability of model predictive path integral (MPPI)-based obstacle avoidance in decentralized multi-robot systems—caused by uninformed, random sampling—this paper proposes CoRL-MPPI, a novel decentralized control framework integrating cooperative reinforcement learning with MPPI. Its core innovation is a learnable neural network that guides the MPPI sampling distribution, significantly improving trajectory quality and collaborative efficiency while preserving MPPI’s theoretical safety guarantees. The cooperative policy is trained under local observations and directly embedded into MPPI’s stochastic sampling process. Simulation results in high-density dynamic environments demonstrate that CoRL-MPPI achieves substantially higher task success rates and shorter average completion times compared to ORCA, BVC, and multi-agent MPPI, while strictly ensuring collision-free navigation.

Technology Category

Application Category

📝 Abstract
Decentralized collision avoidance remains a core challenge for scalable multi-robot systems. One of the promising approaches to tackle this problem is Model Predictive Path Integral (MPPI) -- a framework that is naturally suited to handle any robot motion model and provides strong theoretical guarantees. Still, in practice MPPI-based controller may provide suboptimal trajectories as its performance relies heavily on uninformed random sampling. In this work, we introduce CoRL-MPPI, a novel fusion of Cooperative Reinforcement Learning and MPPI to address this limitation. We train an action policy (approximated as deep neural network) in simulation that learns local cooperative collision avoidance behaviors. This learned policy is then embedded into the MPPI framework to guide its sampling distribution, biasing it towards more intelligent and cooperative actions. Notably, CoRL-MPPI preserves all the theoretical guarantees of regular MPPI. We evaluate our approach in dense, dynamic simulation environments against state-of-the-art baselines, including ORCA, BVC, and a multi-agent MPPI implementation. Our results demonstrate that CoRL-MPPI significantly improves navigation efficiency (measured by success rate and makespan) and safety, enabling agile and robust multi-robot navigation.
Problem

Research questions and friction points this paper is trying to address.

Enhancing decentralized multi-robot collision avoidance with learnable cooperative behaviors
Improving MPPI sampling efficiency through guided reinforcement learning policies
Maintaining theoretical safety guarantees while enabling agile robot navigation
Innovation

Methods, ideas, or system contributions that make the work stand out.

CoRL-MPPI combines reinforcement learning with MPPI
Learned policy guides MPPI sampling for cooperative behaviors
Preserves MPPI theoretical guarantees while improving efficiency
🔎 Similar Papers
No similar papers found.