INTACT: Ego-Guided Typed Sparse Evidence Retrieval for Heterogeneous Collaborative Perception

📅 2026-06-03
📈 Citations: 0
Influential: 0
📄 PDF

career value

202K/year
🤖 AI Summary
This work addresses the challenge in collaborative perception where heterogeneous sensors and models hinder intermediate feature alignment and impose high integration costs for new participants. To overcome this, the authors propose an ego-centric, type-aware sparse evidence retrieval framework: the ego vehicle issues location-based queries annotated with semantic types, and collaborators respond only with local evidence relevant to those queries. This evidence is then fused via a sparse routing mechanism coupled with a gated residual write-back module. By shifting compatibility requirements from global feature alignment to query-driven, type-specific local response comparability, the approach enables plug-and-play integration of new collaborators without any retraining. The method achieves 80.1 AP70 on OPV2V-H with only 0.52M additional parameters and 18.0 log₂ communication overhead (approximately 16× compression), and attains 43.8 AP50 on the real-world DAIR-V2X dataset.
📝 Abstract
Collaborative perception extends the perceptual range of autonomous vehicles by sharing information across agents, but heterogeneous sensors and perception models make intermediate feature fusion difficult to deploy at scale. Existing heterogeneous collaboration methods typically follow a translation-first paradigm: collaborator features must be aligned, adapted, or projected into an ego-compatible space before fusion. Such feature-compatibility contracts improve fixed-system performance, but they couple deployment to collaborator-specific adaptation and make newly joined heterogeneous agents costly to integrate. To address this gap, we propose INTACT, an ego-guided typed sparse evidence retrieval framework for heterogeneous collaborative perception. Instead of translating an entire collaborator feature map, INTACT lets the ego vehicle issue typed evidence queries that express suspected objects and evidence-deficient regions. Collaborators respond only with local evidence at queried locations, and the ego selects useful responses through sparse per-query routing and injects them through gated residual write-back. This changes the compatibility requirement from global feature-map interpretability to local, typed response comparability under ego-issued queries, enabling a zero-training heterogeneous insertion protocol in which the ego interface is trained once and new collaborators join through checkpoint merging. Extensive experiments on simulated and real-world heterogeneous collaborative perception benchmarks validate the effectiveness and deployability of INTACT. On OPV2V-H, INTACT achieves 80.1 AP70 with only 0.52M additional parameters and 18.0 $\log_2$ communication volume, corresponding to about 16$\times$ compression over dense feature transmission. On DAIR-V2X, INTACT achieves 43.8 AP50 under challenging real-world conditions.
Problem

Research questions and friction points this paper is trying to address.

heterogeneous collaborative perception
feature fusion
sensor heterogeneity
scalable deployment
collaborative perception
Innovation

Methods, ideas, or system contributions that make the work stand out.

heterogeneous collaborative perception
ego-guided retrieval
typed sparse evidence
zero-training insertion
checkpoint merging
C
Chen Li
National Key Laboratory of Multispectral Information Intelligent Processing, School of Artificial Intelligence and Automation, Huazhong University of Science and Technology
S
Shengrong Yuan
National Key Laboratory of Multispectral Information Intelligent Processing, School of Artificial Intelligence and Automation, Huazhong University of Science and Technology
Jialong Zuo
Jialong Zuo
Zhejiang University
Speech SynthesisVoice Conversion
X
Xinzhong Zhu
Zhejiang Normal University
Nong Sang
Nong Sang
Huazhong University of Science and Technology
Computer Vision and Pattern Recognition
C
Changxin Gao
National Key Laboratory of Multispectral Information Intelligent Processing, School of Artificial Intelligence and Automation, Huazhong University of Science and Technology