🤖 AI Summary
To address three key challenges in RGB-D salient object detection (SOD)—insufficient modeling of cross-modal global dependencies, inadequate exploitation of saliency priors, and poor robustness to low-quality depth maps—this paper proposes CM-S6Net. First, it introduces state space models (SSMs) into RGB-D SOD for the first time and designs a Cross-Modal Selective Scan (CM-S6) mechanism to capture long-range, multi-modal dependencies. Second, it develops a Saliency Enhancement Module (SEM) that integrates hierarchical saliency priors. Third, it proposes an adaptive depth map contrast enhancement strategy to improve robustness against depth noise and degradation. Extensive experiments on seven benchmark datasets demonstrate that CM-S6Net consistently outperforms state-of-the-art methods, achieving significant improvements in localization accuracy under complex scenes and enhanced stability under degraded depth conditions.
📝 Abstract
Salient object detection (SOD) in RGB-D images is an essential task in computer vision, enabling applications in scene understanding, robotics, and augmented reality. However, existing methods struggle to capture global dependency across modalities, lack comprehensive saliency priors from both RGB and depth data, and are ineffective in handling low-quality depth maps. To address these challenges, we propose SSNet, a saliency-prior and state space model (SSM)-based network for the RGB-D SOD task. Unlike existing convolution- or transformer-based approaches, SSNet introduces an SSM-based multi-modal multi-scale decoder module to efficiently capture both intra- and inter-modal global dependency with linear complexity. Specifically, we propose a cross-modal selective scan SSM (CM-S6) mechanism, which effectively captures global dependency between different modalities. Furthermore, we introduce a saliency enhancement module (SEM) that integrates three saliency priors with deep features to refine feature representation and improve the localization of salient objects. To further address the issue of low-quality depth maps, we propose an adaptive contrast enhancement technique that dynamically refines depth maps, making them more suitable for the RGB-D SOD task. Extensive quantitative and qualitative experiments on seven benchmark datasets demonstrate that SSNet outperforms state-of-the-art methods.