🤖 AI Summary
This work addresses the challenge of bandwidth-constrained collaborative perception, where existing approaches struggle to balance local saliency and global context. The authors propose a receiver-driven collaborative perception framework that dynamically allocates feature transmission budgets across agents by leveraging lightweight saliency metadata and global request planning, transmitting only the most informative features. A novel collaborative feature routing module aligns cross-agent messages to ensure structural consistency. This approach introduces the first receiver-initiated global coordination mechanism, jointly optimizing what to share and from whom to share, thereby overcoming limitations of conventional fixed-compression or object-centric paradigms. Experiments on OPV2V demonstrate significant performance gains—achieving a +2.4% improvement in AP@0.7—using only 5% of the communication bandwidth, while maintaining robustness under localization noise.
📝 Abstract
Collaborative perception is vital for autonomous driving yet remains constrained by tight communication budgets. Earlier work reduced bandwidth by compressing full feature maps with fixed-rate encoders, which adapts poorly to a changing environment, and it further evolved into spatial selection methods that improve efficiency by focusing on salient regions, but this object-centric approach often sacrifices global context, weakening holistic scene understanding. To overcome these limitations, we introduce \textit{WhisperNet}, a bandwidth-aware framework that proposes a novel, receiver-centric paradigm for global coordination across agents. Senders generate lightweight saliency metadata, while the receiver formulates a global request plan that dynamically budgets feature contributions across agents and features, retrieving only the most informative features. A collaborative feature routing module then aligns related messages before fusion to ensure structural consistency. Extensive experiments show that WhisperNet achieves state-of-the-art performance, improving AP@0.7 on OPV2V by 2.4\% with only 0.5\% of the communication cost. As a plug-and-play component, it boosts strong baselines with merely 5\% of full bandwidth while maintaining robustness under localization noise. These results demonstrate that globally-coordinated allocation across \textit{what} and \textit{where} to share is the key to achieving efficient collaborative perception.