π€ AI Summary
This work addresses the limitations of insufficient behavioral diversity, poor controllability, and low safety in autonomous driving planning by proposing a three-stage framework. First, it replaces the Gaussian prior with a structured trajectory vocabulary constructed via farthest-point sampling to enhance behavioral diversity. Second, it introduces an energy field based on static road geometry to relocate anchor points, enabling differentiable-free corridor control and improving controllability. Finally, it employs zeroth-order reinforcement learning in anchor-point space for directional search optimization, circumventing complex likelihood computations and ODE-SDE conversions. Evaluated across two million test scenarios, the method reduces near-miss collision rates by 89%, increases average reward by 32%, preserves imitation accuracy, and achieves 2.06 ms per-step inference latency on NVIDIA Drive Orin, with real-vehicle deployment feasibility validated experimentally.
π Abstract
We present DriveAnchor, a three-stage framework for autonomous driving planning that achieves behavioral diversity, controllability, and safety in a composable pipeline. Demonstration Flow Pretraining replaces the unstructured Gaussian prior with a vocabulary of 2,398 trajectory shapes constructed by farthest-point sampling, structurally grounding behavioral diversity in vocabulary coverage. Guided Flow Post-training jointly post-trains an Energy Field module with flow matching (FM), conditioning the Energy Field on static road geometry alone, to relocate anchors toward user-specified corridor polygons before flow generation, adding controllability without differentiable guidance; after Stage 2, new corridor presets require only Energy Field updates, not FM retraining. Reward-Refined Flow Fine-tuning applies zeroth-order reinforcement learning to align each anchor's output with collision-avoidance objectives: because the flow-matching model is a deterministic feedforward network in single-step mode, each anchor uniquely determines the output trajectory, reducing reward optimization to a direction search in anchor space without log-likelihood computation or ODE-to-SDE conversion. Evaluated on approximately 2 million held-out driving scenarios, DriveAnchor reduces near-range collision rates by 89% and improves mean reward by 32% without degradation in imitation accuracy, with 2.06 ms inference on NVIDIA Drive Orin. DriveAnchor has been validated through real-world vehicle testing, confirming its practicality for production deployment.