PLAN-S: Bridging Planning with Latent Style Dynamics for Autonomous Driving World Models

πŸ“… 2026-06-04
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

182K/year
πŸ€– AI Summary
Existing latent-variable world models generate trajectories directly from entangled latent representations, making it difficult to explicitly model risk, drivability, and driving styleβ€”leading to uncontrollable styles and compromised safety. This work proposes PLAN-S, a novel bridging module that connects world models with planners by introducing, for the first time, a style-conditioned four-channel semantic cost map. This cost map is integrated into planning decisions via either attention-level or reward-level fusion, enabling controllable modulation of driving style and safer trajectory selection without fine-tuning the frozen backbone model. Experiments demonstrate that PLAN-S reduces the 3-second trajectory L2 error to 0.55 meters and lowers collision rates by 42% on nuScenes. On NAVSIM, its rule-based cost variant achieves 89.4 PDMS, while the learned cost variant significantly enhances performance in challenging scenarios.
πŸ“ Abstract
Latent world models (LWMs) have strengthened end-to-end autonomous driving by forecasting compact scene dynamics for downstream planning. However, existing LWM-based planners usually generate trajectories directly from entangled latent representations. This compact latent-to-planner pathway lacks explicit modeling of risk, drivability, and diverse style preferences, making driving-style dynamics difficult to supervise, inspect, or modulate before a final trajectory is selected. We propose PLAN-S (PLANning with latent Style dynamics), a planner-facing bridge that addresses this compactness-controllability dilemma by decoding a style-conditioned, four-channel semantic cost map from the latent representation. The cost map is conditioned on ego state and driving style and is consumed up-stream of the planning decision through two host-side interfaces: attention-level fusion for regression planners and reward-level fusion for anchor-score planners. We validate PLAN-S on two architecturally distinct hosts, ResWorld on nuScenes and WoTE on NAVSIM, while keeping the host backbones frozen to isolate the contribution of the proposed bridge. On nuScenes, PLAN-S reduces L2 at every horizon over the baseline, with 0.55 m average L2 and a 42% relative reduction in the 3 s collision rate. On NAVSIM, the rule-cost variant reaches 89.4 Predictive Driver Model Score (PDMS), while the learned cost variant provides complementary gains on baseline-challenging scenes. Ablations show that the cost pathway contributes most directly to safer trajectory selection. Qualitative results further show that PLAN-S can produce diverse cost maps, with spatially consistent variations aligned to different driving styles.
Problem

Research questions and friction points this paper is trying to address.

latent world models
autonomous driving
trajectory planning
driving style
controllability
Innovation

Methods, ideas, or system contributions that make the work stand out.

Latent World Models
Driving Style
Semantic Cost Map
Autonomous Driving Planning
Controllability
πŸ”Ž Similar Papers
No similar papers found.
X
Xiaoyun Qiu
Intelligent Transportation Thrust, Systems Hub, and Center of Seamless Connectivity & Connected Intelligence, The Hong Kong University of Science and Technology (Guangzhou), Guangzhou 511453, China
J
Jingtao He
Intelligent Transportation Thrust, Systems Hub, and Center of Seamless Connectivity & Connected Intelligence, The Hong Kong University of Science and Technology (Guangzhou), Guangzhou 511453, China
Yijie Chen
Yijie Chen
Professor of Wenzhou Medical University, Postdoc researcher in UCSD, Ph.D. in SJTU
NanomedicineDetoxificationVaccination
Y
Yusong Huang
Intelligent Transportation Thrust, Systems Hub, and Center of Seamless Connectivity & Connected Intelligence, The Hong Kong University of Science and Technology (Guangzhou), Guangzhou 511453, China
Haotian Wang
Haotian Wang
The Hong Kong University of Science and Technology (Guangzhou)
computer vision3D visionmulti-modal fusion
Yixuan Wang
Yixuan Wang
Chinese University of Hong Kong
Machine LearningNeural NetworksBioinformatics
Xinhu Zheng
Xinhu Zheng
Assistant Professor, The Hong Kong University of Science and Technology (Guangzhou)