Divide and Merge: Motion and Semantic Learning in End-to-End Autonomous Driving

📅 2025-02-11

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

In end-to-end autonomous driving, tight coupling between semantic understanding and motion modeling induces multi-task negative transfer, degrading detection and tracking performance. To address this, we propose a Semantic-Motion Decoupled Learning framework. Our method introduces a novel neural Bayesian motion decoder that enables parallel, decoupled inference for 3D detection, multi-object tracking, and trajectory prediction. An interactive semantic decoder is designed to foster cross-task positive transfer. Furthermore, we adopt a parallel architecture featuring learnable motion queries and shared recurrently updated reference points, integrating Bayesian filtering principles with interactive Transformer decoding. Evaluated on nuScenes, our approach achieves a +5% improvement in 3D detection mAP and a +11% gain in AMOTA for multi-object tracking. In open-loop planning, it attains state-of-the-art collision rates without requiring modifications to downstream planning modules.

Technology Category

Application Category

📝 Abstract

Perceiving the environment and its changes over time corresponds to two fundamental yet heterogeneous types of information: semantics and motion. Previous end-to-end autonomous driving works represent both types of information in a single feature vector. However, including motion tasks, such as prediction and planning, always impairs detection and tracking performance, a phenomenon known as negative transfer in multi-task learning. To address this issue, we propose Neural-Bayes motion decoding, a novel parallel detection, tracking, and prediction method separating semantic and motion learning, similar to the Bayes filter. Specifically, we employ a set of learned motion queries that operate in parallel with the detection and tracking queries, sharing a unified set of recursively updated reference points. Moreover, we employ interactive semantic decoding to enhance information exchange in semantic tasks, promoting positive transfer. Experiments on the nuScenes dataset show improvements of 5% in detection and 11% in tracking. Our method achieves state-of-the-art collision rates in open-loop planning evaluation without any modifications to the planning module.

Problem

Research questions and friction points this paper is trying to address.

Separates semantic and motion learning

Reduces negative transfer in multi-task learning

Improves detection and tracking performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Neural-Bayes motion decoding

Parallel detection and tracking

Interactive semantic decoding

🔎 Similar Papers

No similar papers found.

Authors to Follow