SSNet: Saliency Prior and State Space Model-based Network for Salient Object Detection in RGB-D Images

📅 2025-03-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address three key challenges in RGB-D salient object detection (SOD)—insufficient modeling of cross-modal global dependencies, inadequate exploitation of saliency priors, and poor robustness to low-quality depth maps—this paper proposes CM-S6Net. First, it introduces state space models (SSMs) into RGB-D SOD for the first time and designs a Cross-Modal Selective Scan (CM-S6) mechanism to capture long-range, multi-modal dependencies. Second, it develops a Saliency Enhancement Module (SEM) that integrates hierarchical saliency priors. Third, it proposes an adaptive depth map contrast enhancement strategy to improve robustness against depth noise and degradation. Extensive experiments on seven benchmark datasets demonstrate that CM-S6Net consistently outperforms state-of-the-art methods, achieving significant improvements in localization accuracy under complex scenes and enhanced stability under degraded depth conditions.

Technology Category

Application Category

📝 Abstract
Salient object detection (SOD) in RGB-D images is an essential task in computer vision, enabling applications in scene understanding, robotics, and augmented reality. However, existing methods struggle to capture global dependency across modalities, lack comprehensive saliency priors from both RGB and depth data, and are ineffective in handling low-quality depth maps. To address these challenges, we propose SSNet, a saliency-prior and state space model (SSM)-based network for the RGB-D SOD task. Unlike existing convolution- or transformer-based approaches, SSNet introduces an SSM-based multi-modal multi-scale decoder module to efficiently capture both intra- and inter-modal global dependency with linear complexity. Specifically, we propose a cross-modal selective scan SSM (CM-S6) mechanism, which effectively captures global dependency between different modalities. Furthermore, we introduce a saliency enhancement module (SEM) that integrates three saliency priors with deep features to refine feature representation and improve the localization of salient objects. To further address the issue of low-quality depth maps, we propose an adaptive contrast enhancement technique that dynamically refines depth maps, making them more suitable for the RGB-D SOD task. Extensive quantitative and qualitative experiments on seven benchmark datasets demonstrate that SSNet outperforms state-of-the-art methods.
Problem

Research questions and friction points this paper is trying to address.

Captures global dependency across RGB-D modalities effectively
Integrates saliency priors to refine feature representation
Enhances low-quality depth maps for better object detection
Innovation

Methods, ideas, or system contributions that make the work stand out.

SSM-based multi-modal multi-scale decoder module
Cross-modal selective scan SSM (CM-S6) mechanism
Adaptive contrast enhancement for depth maps
🔎 Similar Papers
No similar papers found.