SigmaRL: A Sample-Efficient and Generalizable Multi-Agent Reinforcement Learning Framework for Motion Planning

📅 2024-08-14

🏛️ 2024 IEEE 27th International Conference on Intelligent Transportation Systems (ITSC)

📈 Citations: 3

✨ Influential: 0

career value

204K/year

🤖 AI Summary

To address the low sample efficiency and poor cross-scenario generalization of multi-agent reinforcement learning (MARL) in connected autonomous vehicle (CAV) motion planning, this paper introduces SigmaRL—an open-source, fully decentralized MARL framework. Its core innovation lies in the first systematic design of five generic traffic-feature-driven, information-rich observation spaces, markedly enhancing observational information density and semantic representation capability. This design enables fully decentralized decision-making and permits training on a single CPU in under one hour. SigmaRL achieves zero-shot transfer to previously unseen traffic scenarios—including novel intersections, on-ramps, and roundabouts—demonstrating exceptional sample efficiency and strong generalization. By enabling lightweight, onboard-deployable MARL motion planning, SigmaRL establishes a new paradigm for scalable, real-world CAV coordination.

Technology Category

Application Category

📝 Abstract

This paper introduces an open-source, decentralized framework named SigmaRL, designed to enhance both sample effciency and generalization of multi-agent Reinforcement Learning (RL) for motion planning of connected and automated vehicles. Most RL agents exhibit a limited capacity to generalize, often focusing narrowly on specific scenarios, and are usually evaluated in similar or even the same scenarios seen during training. Various methods have been proposed to address these challenges, including experience replay and regularization. However, how observation design in RL affects sample efficiency and generalization remains an under-explored area. We address this gap by proposing five strategies to design information-dense observations, focusing on general features that are applicable to most traffic scenarios. We train our RL agents using these strategies on an intersection and evaluate their generalization through numerical experiments across completely unseen traffic scenarios, including a new intersection, an on-ramp, and a roundabout. Incorporating these information-dense observations reduces training times to under one hour on a single CPU, and the evaluation results reveal that our RL agents can effectively zero-shot generalize.

Problem

Research questions and friction points this paper is trying to address.

Enhancing sample efficiency in multi-agent RL for motion planning

Improving generalization of RL agents across unseen traffic scenarios

Exploring observation design impact on RL performance and training time

Innovation

Methods, ideas, or system contributions that make the work stand out.

Decentralized multi-agent RL framework

Information-dense observation strategies

Zero-shot generalization across scenarios

🔎 Similar Papers

No similar papers found.