๐ค AI Summary
Existing end-to-end autonomous driving approaches rely heavily on imitation learning (IL), exhibiting poor generalization in highly dynamic, interactive traffic scenarios. To address this, we propose an IL-to-RL progressive training paradigm, introducing an end-to-end framework that integrates multimodal perception (RGB-LiDAR), an asymmetric variational autoencoder (VAE), and a Transformer-based joint world modelโenabling robust policy transfer from static imitation to dynamic exploration. We further enhance policy stability via KL-constrained knowledge distillation and soft parameter updates. Evaluated on the CARLA Leaderboard, our method achieves state-of-the-art (SOTA) performance: it is the only end-to-end model to successfully complete all 38 high-difficulty dynamic scenarios in Leaderboard 2.0, with significantly higher route completion rates than prior work. The source code is publicly available.
๐ Abstract
In recent years, end-to-end autonomous driving architectures have gained increasing attention due to their advantage in avoiding error accumulation. Most existing end-to-end autonomous driving methods are based on Imitation Learning (IL), which can quickly derive driving strategies by mimicking expert behaviors. However, IL often struggles to handle scenarios outside the training dataset, especially in high-dynamic and interaction-intensive traffic environments. In contrast, Reinforcement Learning (RL)-based driving models can optimize driving decisions through interaction with the environment, improving adaptability and robustness. To leverage the strengths of both IL and RL, we propose RAMBLE, an end-to-end world model-based RL method for driving decision-making. RAMBLE extracts environmental context information from RGB images and LiDAR data through an asymmetrical variational autoencoder. A transformer-based architecture is then used to capture the dynamic transitions of traffic participants. Next, an actor-critic structure reinforcement learning algorithm is applied to derive driving strategies based on the latent features of the current state and dynamics. To accelerate policy convergence and ensure stable training, we introduce a training scheme that initializes the policy network using IL, and employs KL loss and soft update mechanisms to smoothly transition the model from IL to RL. RAMBLE achieves state-of-the-art performance in route completion rate on the CARLA Leaderboard 1.0 and completes all 38 scenarios on the CARLA Leaderboard 2.0, demonstrating its effectiveness in handling complex and dynamic traffic scenarios. The model will be open-sourced upon paper acceptance at https://github.com/SCP-CN-001/ramble to support further research and development in autonomous driving.