Control Your Queries: Heterogeneous Query Interaction for Camera-Radar Fusion

📅 2026-04-28
📈 Citations: 0
Influential: 0
📄 PDF

career value

213K/year
🤖 AI Summary
Existing camera–radar fusion methods exhibit limitations in query initialization and cross-modal interaction, hindering effective exploitation of the complementary nature of heterogeneous sensors. This work proposes a heterogeneous query interaction paradigm that introduces learnable world queries and jointly optimizes image, radar, and world queries to enhance 3D object detection performance. The core innovations include QMix, a cross-type attention mechanism, and QSwap, an interactive query-swapping sampling strategy, both of which significantly strengthen multimodal feature fusion. The proposed method achieves state-of-the-art results on the nuScenes benchmark, attaining 59.1 mAP and 65.6 NDS on the validation set and 61.6 mAP and 67.9 NDS on the test set.
📝 Abstract
In autonomous driving, camera-radar fusion offers complementary sensing and low deployment cost. Existing methods perform fusion through input mixing, feature map mixing, or query-based feature sampling. We propose a new fusion paradigm, termed heterogeneous query interaction, and present ConFusion, a camera-radar 3D object detector. ConFusion combines image queries, radar queries, and learnable world queries distributed in 3D space to improve query initialization and object coverage. To encourage cross-type interaction among heterogeneous queries, we introduce heterogeneous query mixing (QMix), which performs dedicated cross-type attention after feature sampling to consolidate complementary object evidence. We further propose interactive query swap sampling (QSwap), which improves feature sampling by allowing related queries to exchange informative feature tokens under attention and geometric constraints. Experiments on the nuScenes dataset show that ConFusion achieves state-of-the-art performance, reaching 59.1 mAP and 65.6 NDS on the validation set, and 61.6 mAP and 67.9 NDS on the test set.
Problem

Research questions and friction points this paper is trying to address.

camera-radar fusion
3D object detection
heterogeneous query interaction
autonomous driving
Innovation

Methods, ideas, or system contributions that make the work stand out.

heterogeneous query interaction
query mixing
query swap sampling
camera-radar fusion
3D object detection