🤖 AI Summary
This work addresses the significant degradation in robustness and generalization of collaborative perception systems caused by diverse and unpredictable corruptions in real-world sensing data. To this end, we propose CoopDiff—the first framework to integrate diffusion models into collaborative perception—leveraging a denoising mechanism to enhance performance under various corruption types. Our approach adopts a teacher–student paradigm: the teacher module generates clean supervision signals by fusing voxel-level features weighted by Quality of Interest, while the student module employs a dual-branch diffusion architecture with an Ego-Guided Cross-Attention mechanism to enable adaptive feature reconstruction under degraded conditions. Evaluated on the OPV2Vn and DAIR-V2Xn benchmarks, CoopDiff substantially outperforms existing methods, significantly reducing relative corruption error and offering a flexible trade-off between accuracy and inference efficiency.
📝 Abstract
Cooperative perception lets agents share information to expand coverage and improve scene understanding. However, in real-world scenarios, diverse and unpredictable corruptions undermine its robustness and generalization. To address these challenges, we introduce CoopDiff, a diffusion-based cooperative perception framework that mitigates corruptions via a denoising mechanism. CoopDiff adopts a teacher-student paradigm: the Quality-Aware Teacher performs voxel-level early fusion with Quality of Interest weighting and semantic guidance, then produces clean supervision features via a diffusion denoiser. The Dual-Branch Diffusion Student first separates ego and cooperative streams in encoding to reconstruct the teacher's clean targets. And then, an Ego-Guided Cross-Attention mechanism facilitates balanced decoding under degradation by adaptively integrating ego and cooperative features. We evaluate CoopDiff on two constructed multi-degradation benchmarks, OPV2Vn and DAIR-V2Xn, each incorporating six corruption types, including environmental and sensor-level distortions. Benefiting from the inherent denoising properties of diffusion, CoopDiff consistently outperforms prior methods across all degradation types and lowers the relative corruption error. Furthermore, it offers a tunable balance between precision and inference efficiency.