🤖 AI Summary
To address the challenges of online trajectory safety assessment and weak prediction capability in end-to-end autonomous driving, this paper proposes a real-time trajectory evaluation framework based on a lightweight differentiable bird’s-eye view (BEV) world model. The model explicitly predicts multi-step future scene states in BEV space, enabling end-to-end differentiable training and joint optimization with a simulator to achieve low-latency (<50 ms) and high-fidelity trajectory safety assessment. To our knowledge, this is the first work to integrate a compact BEV world model into online trajectory evaluation. Our method achieves state-of-the-art performance on the NAV SIM and Bench2Drive closed-loop benchmarks, significantly improving collision avoidance rate and path plausibility. The source code is publicly available.
📝 Abstract
End-to-end autonomous driving has achieved remarkable progress by integrating perception, prediction, and planning into a fully differentiable framework. Yet, to fully realize its potential, an effective online trajectory evaluation is indispensable to ensure safety. By forecasting the future outcomes of a given trajectory, trajectory evaluation becomes much more effective. This goal can be achieved by employing a world model to capture environmental dynamics and predict future states. Therefore, we propose an end-to-end driving framework WoTE, which leverages a BEV World model to predict future BEV states for Trajectory Evaluation. The proposed BEV world model is latency-efficient compared to image-level world models and can be seamlessly supervised using off-the-shelf BEV-space traffic simulators. We validate our framework on both the NAVSIM benchmark and the closed-loop Bench2Drive benchmark based on the CARLA simulator, achieving state-of-the-art performance. Code is released at https://github.com/liyingyanUCAS/WoTE.