🤖 AI Summary
To address degraded multi-vehicle collaborative perception performance in vehicle-infrastructure cooperative systems caused by bandwidth constraints and sensor calibration errors, this paper proposes a multi-stage adaptive collaborative perception framework. Methodologically: (1) it introduces a novel mid-layer multi-scale contextual robust fusion mechanism to reduce feature transmission overhead; (2) it incorporates cross-vehicle online calibration and correction of detection outputs at the late stage to mitigate the impact of calibration inaccuracies; and (3) it jointly optimizes communication and perception objectives. Evaluated on both real-world and synthetic datasets, the framework reduces communication overhead by 37% compared to state-of-the-art methods, while improving mean Average Precision (mAP) by 12.6% under calibration errors. These results demonstrate significant gains in collaboration efficiency and robustness.
📝 Abstract
Collaborative perception significantly enhances individual vehicle perception performance through the exchange of sensory information among agents. However, real-world deployment faces challenges due to bandwidth constraints and inevitable calibration errors during information exchange. To address these issues, we propose mmCooper, a novel multi-agent, multi-stage, communication-efficient, and collaboration-robust cooperative perception framework. Our framework leverages a multi-stage collaboration strategy that dynamically and adaptively balances intermediate- and late-stage information to share among agents, enhancing perceptual performance while maintaining communication efficiency. To support robust collaboration despite potential misalignments and calibration errors, our framework captures multi-scale contextual information for robust fusion in the intermediate stage and calibrates the received detection results to improve accuracy in the late stage. We validate the effectiveness of mmCooper through extensive experiments on real-world and simulated datasets. The results demonstrate the superiority of our proposed framework and the effectiveness of each component.