🤖 AI Summary
This work addresses the challenge of simultaneously modeling multimodal driving behaviors and enabling real-time planning in complex, real-world traffic scenarios—a balance that existing methods often fail to achieve, leading to hesitant or unsafe decisions. To this end, we propose a real-time planning framework based on a fast sampling consistency model, which achieves robust decision-making through efficient multimodal trajectory generation and heterogeneous feature fusion. The core innovations include a fast sampling consistency model that accelerates exploration of multimodal actions and an attention-augmented decoder that dynamically integrates scene context with action features. Experiments on the Waymax simulator demonstrate that our approach significantly outperforms state-of-the-art methods in dynamic, complex environments, particularly excelling in safety-critical metrics.
📝 Abstract
Closed-loop planning in complex, real-world driving scenarios presents a critical challenge for autonomous driving systems. While traditional rule-based methods are interpretable, their predefined heuristics lack the adaptability for dynamic traffic environments. Learning-based approaches have shown considerable promise. Conversely, learning-based approaches, despite their promise, struggle to balance the modeling diverse and multimodal driving behaviors and real-time planning, often leading to indecisive or unsafe actions. To address this limitation, we propose Consistency Planner, a real-time planning framework with fast-sampling consistency models. Our approach is built upon two key technical contributions. Efficient Multimodal Sampling: We employ fast-sampling consistency models to generate a diverse set of plausible future trajectories. This enables efficient, real-time exploration of multimodal actions, overcoming the computational bottlenecks of previous iterative generative methods. Heterogeneous Feature Fusion: We introduce an attention-enhanced decoder that dynamically integrates heterogeneous input features (including scene feature and action token) into a cohesive representation for robust planning. Extensive evaluation in the Waymax simulator demonstrates superior performance in safety metrics compared to existing methods, with particularly strong results in challenging dynamic scenarios.