DeMo++: Motion Decoupling for Autonomous Driving

📅 2025-07-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing autonomous driving motion prediction and planning methods predominantly adopt a one-to-one query-trajectory paradigm, which struggles to accurately model complex spatiotemporal evolution, often leading to collisions or suboptimal decisions. To address this, we propose a motion-decoupled framework that decomposes prediction into two synergistic branches: (i) holistic intent—capturing multimodal directional preferences—and (ii) fine-grained spatiotemporal state—modeling dynamic trajectory evolution. We further introduce a cross-scenario trajectory interaction mechanism for joint modeling. Our architecture fuses attention and Mamba modules to balance efficient scene understanding with precise temporal modeling, and supports self-optimizing trajectory refinement. Evaluated on four major benchmarks—Argoverse 2, nuScenes, nuPlan, and NAVSIM—our method achieves state-of-the-art performance across motion prediction, motion planning, and end-to-end driving tasks.

Technology Category

Application Category

📝 Abstract
Motion forecasting and planning are tasked with estimating the trajectories of traffic agents and the ego vehicle, respectively, to ensure the safety and efficiency of autonomous driving systems in dynamically changing environments. State-of-the-art methods typically adopt a one-query-one-trajectory paradigm, where each query corresponds to a unique trajectory for predicting multi-mode trajectories. While this paradigm can produce diverse motion intentions, it often falls short in modeling the intricate spatiotemporal evolution of trajectories, which can lead to collisions or suboptimal outcomes. To overcome this limitation, we propose DeMo++, a framework that decouples motion estimation into two distinct components: holistic motion intentions to capture the diverse potential directions of movement, and fine spatiotemporal states to track the agent's dynamic progress within the scene and enable a self-refinement capability. Further, we introduce a cross-scene trajectory interaction mechanism to explore the relationships between motions in adjacent scenes. This allows DeMo++ to comprehensively model both the diversity of motion intentions and the spatiotemporal evolution of each trajectory. To effectively implement this framework, we developed a hybrid model combining Attention and Mamba. This architecture leverages the strengths of both mechanisms for efficient scene information aggregation and precise trajectory state sequence modeling. Extensive experiments demonstrate that DeMo++ achieves state-of-the-art performance across various benchmarks, including motion forecasting (Argoverse 2 and nuScenes), motion planning (nuPlan), and end-to-end planning (NAVSIM).
Problem

Research questions and friction points this paper is trying to address.

Decouples motion estimation into intentions and states
Models diverse motion intentions and spatiotemporal evolution
Improves trajectory prediction and planning in autonomous driving
Innovation

Methods, ideas, or system contributions that make the work stand out.

Decouples motion into intentions and states
Uses cross-scene trajectory interaction mechanism
Combines Attention and Mamba for modeling
🔎 Similar Papers
No similar papers found.
B
Bozhou Zhang
School of Data Science, Fudan University
Nan Song
Nan Song
Chungbuk National University
Molecular epidemiologyGenetic epidemiologyPharmacogenomics
Xiatian Zhu
Xiatian Zhu
University of Surrey
Machine LearningComputer Vision
L
Li Zhang
School of Data Science, Fudan University