🤖 AI Summary
To address two key bottlenecks in camouflaged object detection (COD) and salient object detection (SOD)—insufficient intra-layer channel-wise interaction and difficulty in jointly modeling boundary and region information—this paper proposes a Channel Information Interaction Module (CIIM) and a prior-guided collaborative decoding architecture. CIIM enables cross-channel feature reorganization via horizontal–vertical channel integration, while the decoder employs dual-path prior generation (boundary/region) coupled with hybrid attention-based calibration to jointly optimize structural and semantic cues. Notably, this is the first work to unify COD and SOD under a single framework, demonstrating strong cross-task generalization. Our method achieves state-of-the-art performance on four COD benchmarks. Moreover, it successfully transfers to diverse downstream tasks—including SOD, polyp segmentation, transparent object detection, and industrial defect detection—validating its robustness and versatility. Code and comprehensive experimental results are publicly available.
📝 Abstract
Camouflaged Object Detection (COD) stands as a significant challenge in computer vision, dedicated to identifying and segmenting objects visually highly integrated with their backgrounds. Current mainstream methods have made progress in cross-layer feature fusion, but two critical issues persist during the decoding stage. The first is insufficient cross-channel information interaction within the same-layer features, limiting feature expressiveness. The second is the inability to effectively co-model boundary and region information, making it difficult to accurately reconstruct complete regions and sharp boundaries of objects. To address the first issue, we propose the Channel Information Interaction Module (CIIM), which introduces a horizontal-vertical integration mechanism in the channel dimension. This module performs feature reorganization and interaction across channels to effectively capture complementary cross-channel information. To address the second issue, we construct a collaborative decoding architecture guided by prior knowledge. This architecture generates boundary priors and object localization maps through Boundary Extraction (BE) and Region Extraction (RE) modules, then employs hybrid attention to collaboratively calibrate decoded features, effectively overcoming semantic ambiguity and imprecise boundaries. Additionally, the Multi-scale Enhancement (MSE) module enriches contextual feature representations. Extensive experiments on four COD benchmark datasets validate the effectiveness and state-of-the-art performance of the proposed model. We further transferred our model to the Salient Object Detection (SOD) task and demonstrated its adaptability across downstream tasks, including polyp segmentation, transparent object detection, and industrial and road defect detection. Code and experimental results are publicly available at: https://github.com/akuan1234/ARNet-v2.