From Imitation to Exploration: End-to-end Autonomous Driving based on World Model

๐Ÿ“… 2024-10-03
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Existing end-to-end autonomous driving approaches rely heavily on imitation learning (IL), exhibiting poor generalization in highly dynamic, interactive traffic scenarios. To address this, we propose an IL-to-RL progressive training paradigm, introducing an end-to-end framework that integrates multimodal perception (RGB-LiDAR), an asymmetric variational autoencoder (VAE), and a Transformer-based joint world modelโ€”enabling robust policy transfer from static imitation to dynamic exploration. We further enhance policy stability via KL-constrained knowledge distillation and soft parameter updates. Evaluated on the CARLA Leaderboard, our method achieves state-of-the-art (SOTA) performance: it is the only end-to-end model to successfully complete all 38 high-difficulty dynamic scenarios in Leaderboard 2.0, with significantly higher route completion rates than prior work. The source code is publicly available.

Technology Category

Application Category

๐Ÿ“ Abstract
In recent years, end-to-end autonomous driving architectures have gained increasing attention due to their advantage in avoiding error accumulation. Most existing end-to-end autonomous driving methods are based on Imitation Learning (IL), which can quickly derive driving strategies by mimicking expert behaviors. However, IL often struggles to handle scenarios outside the training dataset, especially in high-dynamic and interaction-intensive traffic environments. In contrast, Reinforcement Learning (RL)-based driving models can optimize driving decisions through interaction with the environment, improving adaptability and robustness. To leverage the strengths of both IL and RL, we propose RAMBLE, an end-to-end world model-based RL method for driving decision-making. RAMBLE extracts environmental context information from RGB images and LiDAR data through an asymmetrical variational autoencoder. A transformer-based architecture is then used to capture the dynamic transitions of traffic participants. Next, an actor-critic structure reinforcement learning algorithm is applied to derive driving strategies based on the latent features of the current state and dynamics. To accelerate policy convergence and ensure stable training, we introduce a training scheme that initializes the policy network using IL, and employs KL loss and soft update mechanisms to smoothly transition the model from IL to RL. RAMBLE achieves state-of-the-art performance in route completion rate on the CARLA Leaderboard 1.0 and completes all 38 scenarios on the CARLA Leaderboard 2.0, demonstrating its effectiveness in handling complex and dynamic traffic scenarios. The model will be open-sourced upon paper acceptance at https://github.com/SCP-CN-001/ramble to support further research and development in autonomous driving.
Problem

Research questions and friction points this paper is trying to address.

Improving adaptability in dynamic traffic environments
Combining Imitation and Reinforcement Learning benefits
Enhancing decision-making with world model-based RL
Innovation

Methods, ideas, or system contributions that make the work stand out.

World model-based RL for autonomous driving
Asymmetrical variational autoencoder for data extraction
IL-initialized RL with KL loss and soft update
๐Ÿ”Ž Similar Papers
No similar papers found.
Yueyuan Li
Yueyuan Li
Department of Automation, Shanghai Jiao Tong University, Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, 200240, CN
Mingyang Jiang
Mingyang Jiang
Shanghai Jiao Tong University
roboticsintelligent vehiclemachine learning
Songan Zhang
Songan Zhang
Global Institute of Future Technology, Shanghai Jiao Tong University
Autonomous VehicleRoboticsAI
W
Wei Yuan
Innovation Center of Intelligent Connected Vehicles, Global Institute of Future Technology, Shanghai Jiao Tong University, Shanghai, 200240, CN
C
Chunxiang Wang
Department of Automation, Shanghai Jiao Tong University, Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, 200240, CN
M
Ming Yang
Department of Automation, Shanghai Jiao Tong University, Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, 200240, CN