🤖 AI Summary
Existing end-to-end autonomous driving systems rely on multi-task heads to separately model perception, prediction, and planning. Although differentiable, such architectures exhibit weak task coupling and poor cross-task coordination. This paper proposes the first diffusion probabilistic modeling framework for autonomous driving, reformulating driving decision-making as a conditional bird’s-eye-view (BEV) image generation task. Heterogeneous driving entities are uniformly rasterized into a shared BEV grid; their joint distribution is modeled via latent variables, and perception–prediction–planning is jointly optimized through iterative denoising sampling. By eliminating task-specific head separation, the approach significantly enhances inter-task synergy and system robustness. Evaluated in closed-loop CARLA simulations, our method achieves new state-of-the-art performance, attaining superior Success Rate and Driving Score compared to prior approaches.
📝 Abstract
End-to-end autonomous driving (E2E-AD) has rapidly emerged as a promising approach toward achieving full autonomy. However, existing E2E-AD systems typically adopt a traditional multi-task framework, addressing perception, prediction, and planning tasks through separate task-specific heads. Despite being trained in a fully differentiable manner, they still encounter issues with task coordination, and the system complexity remains high. In this work, we introduce DiffAD, a novel diffusion probabilistic model that redefines autonomous driving as a conditional image generation task. By rasterizing heterogeneous targets onto a unified bird's-eye view (BEV) and modeling their latent distribution, DiffAD unifies various driving objectives and jointly optimizes all driving tasks in a single framework, significantly reducing system complexity and harmonizing task coordination. The reverse process iteratively refines the generated BEV image, resulting in more robust and realistic driving behaviors. Closed-loop evaluations in Carla demonstrate the superiority of the proposed method, achieving a new state-of-the-art Success Rate and Driving Score. The code will be made publicly available.