🤖 AI Summary
Current end-to-end autonomous driving planners are constrained by the quality of static candidate trajectory sets, limiting the full potential of trajectory scorers. This work proposes TOAD, a method that, for the first time, treats the scorer as a trajectory-level reward function during inference and employs the Cross-Entropy Method to perform warm-start optimization over initial trajectories. This approach dynamically enhances trajectory quality without requiring retraining. TOAD is plug-and-play and compatible with diverse end-to-end planners, consistently improving performance across six base models. It achieves state-of-the-art results on multiple benchmarks, including NAVSIM-v1 (94.7 PDMS), NAVSIM-v2 (56.3 EPDMS), and the closed-loop HUGSIM benchmark.
📝 Abstract
End-to-end planners for autonomous driving typically generate a set of candidate trajectories, score each one, and return the highest-scoring candidate. However, the scorer is applied only after the proposals are generated and cannot influence the set of trajectories: a weak set of candidates limits planning performance regardless of the scorer's quality. We instead treat the scorer as a learned trajectory-level reward function and search for trajectories that maximize it. Our method, TOAD, runs the Cross-Entropy Method at test time, warm-started from the planner's proposals. It requires no retraining and is plug-and-play for existing planners. Across six base planners, TOAD improves results on NAVSIM-v1 (94.7 PDMS), NAVSIM-v2 (56.3 EPDMS), and the closed-loop HUGSIM benchmark. The code will be made publicly available via the project page: https://valeoai.github.io/TOAD/.