🤖 AI Summary
To address the challenges of collaborative collision avoidance, low training efficiency, and poor generalization in complex environments for UAV swarms, this paper proposes a domain-knowledge-driven multi-agent reinforcement learning (MARL) framework. Our method integrates contour modeling from image processing with 2D potential field theory, representing obstacles as local maxima in a differentiable, physics-grounded sparse reward function—thereby eliminating the need for complex credit assignment and global observation sharing inherent in conventional MARL. This design significantly enhances training stability and scalability, enabling efficient training at scale (up to 100 agents) while maintaining robust adaptability in unknown environments lacking well-defined obstacle contours. Experiments demonstrate that our approach outperforms state-of-the-art MARL algorithms in collision avoidance success rate, trajectory smoothness, and energy efficiency, achieving approximately 40% faster training convergence.
📝 Abstract
This paper presents a multi-agent reinforcement learning (MARL) framework for cooperative collision avoidance of UAV swarms leveraging domain knowledge-driven reward. The reward is derived from knowledge in the domain of image processing, approximating contours on a two-dimensional field. By modeling obstacles as maxima on the field, collisions are inherently avoided as contours never go through peaks or intersect. Additionally, counters are smooth and energy-efficient. Our framework enables training with large swarm sizes as the agent interaction is minimized and the need for complex credit assignment schemes or observation sharing mechanisms in state-of-the-art MARL approaches are eliminated. Moreover, UAVs obtain the ability to adapt to complex environments where contours may be non-viable or non-existent through intensive training. Extensive experiments are conducted to evaluate the performances of our framework against state-of-the-art MARL algorithms.