From Imitation to Exploration: End-to-end Autonomous Driving based on World Model

📅 2024-10-03

📈 Citations: 0

✨ Influential: 0

career value

214K/year

🤖 AI Summary

Existing end-to-end autonomous driving approaches rely heavily on imitation learning (IL), exhibiting poor generalization in highly dynamic, interactive traffic scenarios. To address this, we propose an IL-to-RL progressive training paradigm, introducing an end-to-end framework that integrates multimodal perception (RGB-LiDAR), an asymmetric variational autoencoder (VAE), and a Transformer-based joint world model—enabling robust policy transfer from static imitation to dynamic exploration. We further enhance policy stability via KL-constrained knowledge distillation and soft parameter updates. Evaluated on the CARLA Leaderboard, our method achieves state-of-the-art (SOTA) performance: it is the only end-to-end model to successfully complete all 38 high-difficulty dynamic scenarios in Leaderboard 2.0, with significantly higher route completion rates than prior work. The source code is publicly available.

Technology Category

Application Category

📝 Abstract

In recent years, end-to-end autonomous driving architectures have gained increasing attention due to their advantage in avoiding error accumulation. Most existing end-to-end autonomous driving methods are based on Imitation Learning (IL), which can quickly derive driving strategies by mimicking expert behaviors. However, IL often struggles to handle scenarios outside the training dataset, especially in high-dynamic and interaction-intensive traffic environments. In contrast, Reinforcement Learning (RL)-based driving models can optimize driving decisions through interaction with the environment, improving adaptability and robustness. To leverage the strengths of both IL and RL, we propose RAMBLE, an end-to-end world model-based RL method for driving decision-making. RAMBLE extracts environmental context information from RGB images and LiDAR data through an asymmetrical variational autoencoder. A transformer-based architecture is then used to capture the dynamic transitions of traffic participants. Next, an actor-critic structure reinforcement learning algorithm is applied to derive driving strategies based on the latent features of the current state and dynamics. To accelerate policy convergence and ensure stable training, we introduce a training scheme that initializes the policy network using IL, and employs KL loss and soft update mechanisms to smoothly transition the model from IL to RL. RAMBLE achieves state-of-the-art performance in route completion rate on the CARLA Leaderboard 1.0 and completes all 38 scenarios on the CARLA Leaderboard 2.0, demonstrating its effectiveness in handling complex and dynamic traffic scenarios. The model will be open-sourced upon paper acceptance at https://github.com/SCP-CN-001/ramble to support further research and development in autonomous driving.

Problem

Research questions and friction points this paper is trying to address.

Improving adaptability in dynamic traffic environments

Combining Imitation and Reinforcement Learning benefits

Enhancing decision-making with world model-based RL

Innovation

Methods, ideas, or system contributions that make the work stand out.

World model-based RL for autonomous driving

Asymmetrical variational autoencoder for data extraction

IL-initialized RL with KL loss and soft update

🔎 Similar Papers

Mitigating Covariate Shift in Imitation Learning for Autonomous Vehicles Using Latent Space Generative World Models