Multi-UAV Collision Avoidance using Multi-Agent Reinforcement Learning with Counterfactual Credit Assignment

πŸ“… 2022-04-19
πŸ›οΈ arXiv.org
πŸ“ˆ Citations: 7
✨ Influential: 1
πŸ“„ PDF
πŸ€– AI Summary
Existing multi-agent reinforcement learning (MARL) approaches for cooperative collision avoidance among small-scale UAV swarms (≀3 agents) suffer from poor adaptability to continuous action spaces, high computational complexity, and excessive energy consumption. Method: We propose MACA, a centralized-training-with-decentralized-execution MARL algorithm featuring an actor-critic architecture and a novel marginalized state-action counterfactual baseline to address the credit assignment problem precisely. We further introduce MACAEnvβ€”a physics-aware simulation environment that faithfully models UAV dynamics and inter-agent interaction constraints. Results: Experiments demonstrate that MACA achieves over 16% higher average reward than state-of-the-art MARL baselines; compared to conventional collision-avoidance methods, it reduces task failure rate by 90% and cuts response time by more than 99%. MACA exhibits strong robustness across diverse scenarios, significantly enhancing both flight safety and energy efficiency.
πŸ“ Abstract
Multi-UAV collision avoidance is a challenging task for UAV swarm applications due to the need of tight cooperation among swarm members for collision-free path planning. Centralized Training with Decentralized Execution (CTDE) in Multi-Agent Reinforcement Learning is a promising method for multi-UAV collision avoidance, in which the key challenge is to effectively learn decentralized policies that can maximize a global reward cooperatively. We propose a new multi-agent critic-actor learning scheme called MACA for UAV swarm collision avoidance. MACA uses a centralized critic to maximize the discounted global reward that considers both safety and energy efficiency, and an actor per UAV to find decentralized policies to avoid collisions. To solve the credit assignment problem in CTDE, we design a counterfactual baseline that marginalizes both an agent's state and action, enabling to evaluate the importance of an agent in the joint observation-action space. To train and evaluate MACA, we design our own simulation environment MACAEnv to closely mimic the realistic behaviors of a UAV swarm. Simulation results show that MACA achieves more than 16% higher average reward than two state-of-the-art MARL algorithms and reduces failure rate by 90% and response time by over 99% compared to a conventional UAV swarm collision avoidance algorithm in all test scenarios.
Problem

Research questions and friction points this paper is trying to address.

Decentralized collision avoidance for small UAV swarms
Energy-efficient cooperation among UAVs
Overcoming continuous action space challenges
Innovation

Methods, ideas, or system contributions that make the work stand out.

Decentralized collision avoidance for small UAV swarms
Energy-efficient cooperative Multi-Agent Reinforcement Learning
Novel credit assignment considering UAV interrelations
πŸ”Ž Similar Papers
No similar papers found.
Shuangyao Huang
Shuangyao Huang
University of Otago
AI
H
Haibo Zhang
School of Computing, University of Otago, Dunedin 9016, New Zealand
Z
Zhiyi Huang
School of Computing, University of Otago, Dunedin 9016, New Zealand