Merging model-based control with multi-agent reinforcement learning for multi-agent cooperative teaming strategies

📅 2026-06-04

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

This work addresses the challenge of balancing long-term strategic coordination with short-term safety and feasibility of actions in multi-agent systems by proposing a Multi-Agent Actor-Critic Model Predictive Control (MA-AC-MPC) algorithm. MA-AC-MPC uniquely integrates the actor-critic reinforcement learning framework with model predictive control, enabling long-horizon optimization of collaborative policies through reinforcement learning while generating short-horizon control commands that respect dynamic constraints and ensure safety. Evaluated on a heterogeneous robotic platform comprising drones and omnidirectional wheeled robots in a pursuit-evasion task, the method achieves a 100% task success rate—substantially outperforming conventional MLP-based approaches, which attain only 60% success—demonstrating its effectiveness and robustness in complex, dynamic environments.

📝 Abstract

In this work, we propose a framework that combines multi-agent reinforcement learning (MARL) with model-based control to achieve safe, dynamically feasible actions in cooperative multi-agent tasks. Multi-agent reinforcement learning provides the advantage of learning cooperative policies for multi-agent teams from discrete non-differentiable rewards in a long planning horizon. Model-predictive control is robust and offers safe, dynamically feasible actions in a fast replanning framework for short horizons. We propose an algorithm that extends actor-critic model predictive control for MARL which we refer to as multi-agent actor-critic model predictive control (MA-AC-MPC). We demonstrate the capabilities of this algorithm by applying it to a multi-agent pursuit-evasion scenario. Specifically, we compare the evader team's strategy using the MA-AC-MPC model and a multi-layer perceptron model (MA-AC-MLP). The pursuer team uses augmented proportional navigation as it is accepted as an advanced adversarial control law. We also provide an example with a heterogeneous environment where a drone and omni-wheeled rover cooperate to achieve repeatable and successful landing with 100% success rate in hardware for MA-AC-MPC compared to 60% for MA-AC-MLP. We demonstrate the robustness of the proposed MA-AC-MPC algorithm in hardware for both environments.

Problem

Research questions and friction points this paper is trying to address.

multi-agent reinforcement learning

model-based control

cooperative strategies

dynamic feasibility

safety

Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-agent reinforcement learning

model predictive control

actor-critic