🤖 AI Summary
This work addresses the limitations of existing optimization-based air traffic scheduling methods in handling stochasticity in Advanced Air Mobility (AAM) and the poor generalization of multi-agent reinforcement learning (MARL) approaches across diverse airspace structures. To overcome these challenges, the authors propose a MARL framework leveraging a relative polar coordinate state representation, integrated with a lightweight Transformer encoder and a decentralized conflict resolution strategy. The framework trains a universal policy capable of generating safe velocity advisories under varied traffic patterns and intersection angles. Experimental results demonstrate that the approach significantly enhances adaptability and scalability in both structured and unstructured airspace environments. Notably, a single-layer Transformer configuration achieves near-zero near-mid-air collision rates and outperforms deeper networks and pure attention-based baselines in terms of time spent in separation violations.
📝 Abstract
Conventional optimization-based metering depends on strict adherence to precomputed schedules, which limits the flexibility required for the stochastic operations of Advanced Air Mobility (AAM). In contrast, multi-agent reinforcement learning (MARL) offers a decentralized, adaptive framework that can better handle uncertainty, required for safe aircraft separation assurance. Despite this advantage, current MARL approaches often overfit to specific airspace structures, limiting their adaptability to new configurations. To improve generalization, we recast the MARL problem in a relative polar state space and train a transformer encoder model across diverse traffic patterns and intersection angles. The learned model provides speed advisories to resolve conflicts while maintaining aircraft near their desired cruising speeds. In our experiments, we evaluated encoder depths of 1, 2, and 3 layers in both structured and unstructured airspaces, and found that a single encoder configuration outperformed deeper variants, yielding near-zero near mid-air collision rates and shorter loss-of-separation infringements than the deeper configurations. Additionally, we showed that the same configuration outperforms a baseline model designed purely with attention. Together, our results suggest that the newly formulated state representation, novel design of neural network architecture, and proposed training strategy provide an adaptable and scalable decentralized solution for aircraft separation assurance in both structured and unstructured airspaces.