🤖 AI Summary
Autonomous vehicle planning in complex, dense traffic scenarios faces significant challenges in ensuring safety and trajectory accuracy without relying on hand-crafted dynamics models. Method: This paper proposes Differentiable Sequence Search (DSS), a unified framework leveraging Waymax—a differentiable traffic simulator—as both a state predictor and an evaluator. DSS formulates trajectory optimization as a differentiable search problem over continuous action spaces, integrating gradient-based optimization with stochastic sampling to efficiently explore high-quality action sequences in imagined future trajectories. Contribution/Results: Compared to sequence prediction, imitation learning, and model-free reinforcement learning baselines, DSS achieves substantial improvements in path safety and trajectory tracking fidelity. Experimental results demonstrate the effectiveness and generalizability of end-to-end, differentiable-simulation-driven planning, validating that gradient-informed search over learned dynamics can replace rigid, manually designed motion models.
📝 Abstract
Planning allows an agent to safely refine its actions before executing them in the real world. In autonomous driving, this is crucial to avoid collisions and navigate in complex, dense traffic scenarios. One way to plan is to search for the best action sequence. However, this is challenging when all necessary components - policy, next-state predictor, and critic - have to be learned. Here we propose Differentiable Simulation for Search (DSS), a framework that leverages the differentiable simulator Waymax as both a next state predictor and a critic. It relies on the simulator's hardcoded dynamics, making state predictions highly accurate, while utilizing the simulator's differentiability to effectively search across action sequences. Our DSS agent optimizes its actions using gradient descent over imagined future trajectories. We show experimentally that DSS - the combination of planning gradients and stochastic search - significantly improves tracking and path planning accuracy compared to sequence prediction, imitation learning, model-free RL, and other planning methods.