๐ค AI Summary
To address feature misalignment and severe performance degradation caused by communication mismatches in collaborative perception, this paper proposes a hybrid architecture integrating feature-level intermediate fusion with semantic-driven object-level correction. We first reveal the complementary nature of intermediate and late fusion, and introduce a novel dual-branch decoupled design: one branch employs selective feature fusion to reduce communication overhead, while the other leverages semantic-guided spatial displacement estimation and a lightweight object-level correction network to robustly rectify pose estimation errors. The proposed jointly optimized hybrid fusion paradigm achieves approximately 19% improvement in AP@0.7 under extreme communication mismatch, while reducing communication volume by over 5รโsignificantly outperforming existing state-of-the-art methods.
๐ Abstract
Collaborative perception has garnered significant attention as a crucial technology to overcome the perceptual limitations of single-agent systems. Many state-of-the-art (SOTA) methods have achieved communication efficiency and high performance via intermediate fusion. However, they share a critical vulnerability: their performance degrades under adverse communication conditions due to the misalignment induced by data transmission, which severely hampers their practical deployment. To bridge this gap, we re-examine different fusion paradigms, and recover that the strengths of intermediate and late fusion are not a trade-off, but a complementary pairing. Based on this key insight, we propose CoRA, a novel collaborative robust architecture with a hybrid approach to decouple performance from robustness with low communication. It is composed of two components: a feature-level fusion branch and an object-level correction branch. Its first branch selects critical features and fuses them efficiently to ensure both performance and scalability. The second branch leverages semantic relevance to correct spatial displacements, guaranteeing resilience against pose errors. Experiments demonstrate the superiority of CoRA. Under extreme scenarios, CoRA improves upon its baseline performance by approximately 19% in AP@0.7 with more than 5x less communication volume, which makes it a promising solution for robust collaborative perception.