π€ AI Summary
Existing latent-variable world models generate trajectories directly from entangled latent representations, making it difficult to explicitly model risk, drivability, and driving styleβleading to uncontrollable styles and compromised safety. This work proposes PLAN-S, a novel bridging module that connects world models with planners by introducing, for the first time, a style-conditioned four-channel semantic cost map. This cost map is integrated into planning decisions via either attention-level or reward-level fusion, enabling controllable modulation of driving style and safer trajectory selection without fine-tuning the frozen backbone model. Experiments demonstrate that PLAN-S reduces the 3-second trajectory L2 error to 0.55 meters and lowers collision rates by 42% on nuScenes. On NAVSIM, its rule-based cost variant achieves 89.4 PDMS, while the learned cost variant significantly enhances performance in challenging scenarios.
π Abstract
Latent world models (LWMs) have strengthened end-to-end autonomous driving by forecasting compact scene dynamics for downstream planning. However, existing LWM-based planners usually generate trajectories directly from entangled latent representations. This compact latent-to-planner pathway lacks explicit modeling of risk, drivability, and diverse style preferences, making driving-style dynamics difficult to supervise, inspect, or modulate before a final trajectory is selected. We propose PLAN-S (PLANning with latent Style dynamics), a planner-facing bridge that addresses this compactness-controllability dilemma by decoding a style-conditioned, four-channel semantic cost map from the latent representation. The cost map is conditioned on ego state and driving style and is consumed up-stream of the planning decision through two host-side interfaces: attention-level fusion for regression planners and reward-level fusion for anchor-score planners. We validate PLAN-S on two architecturally distinct hosts, ResWorld on nuScenes and WoTE on NAVSIM, while keeping the host backbones frozen to isolate the contribution of the proposed bridge. On nuScenes, PLAN-S reduces L2 at every horizon over the baseline, with 0.55 m average L2 and a 42% relative reduction in the 3 s collision rate. On NAVSIM, the rule-cost variant reaches 89.4 Predictive Driver Model Score (PDMS), while the learned cost variant provides complementary gains on baseline-challenging scenes. Ablations show that the cost pathway contributes most directly to safer trajectory selection. Qualitative results further show that PLAN-S can produce diverse cost maps, with spatially consistent variations aligned to different driving styles.