🤖 AI Summary
Weakly supervised semantic segmentation suffers significant performance degradation under low-light conditions, primarily due to image degradation—such as noise, low contrast, and color distortion—that compromises the reliability of class activation maps (CAMs) and leads to ambiguous pseudo-labels. To address this, we propose DiffDepth, a novel framework integrating diffusion-guided knowledge distillation with depth-guided feature fusion. DiffDepth is the first to leverage the strong denoising capability of diffusion models for cross-illumination knowledge transfer from a normal-light teacher to a low-light student. Additionally, it incorporates depth maps as illumination-invariant geometric priors to enhance feature discriminability and structural awareness. The method jointly optimizes weakly supervised learning, CAM refinement, and pseudo-label polishing. Extensive experiments on multiple low-light weakly supervised segmentation benchmarks demonstrate state-of-the-art performance, achieving substantial gains in both segmentation accuracy and robustness. The code is publicly available.
📝 Abstract
Weakly-supervised semantic segmentation aims to assign category labels to each pixel using weak annotations, significantly reducing manual annotation costs. Although existing methods have achieved remarkable progress in well-lit scenarios, their performance significantly degrades in low-light environments due to two fundamental limitations: severe image quality degradation (e.g., low contrast, noise, and color distortion) and the inherent constraints of weak supervision. These factors collectively lead to unreliable class activation maps and semantically ambiguous pseudo-labels, ultimately compromising the model's ability to learn discriminative feature representations. To address these problems, we propose Diffusion-Guided Knowledge Distillation for Weakly-Supervised Low-light Semantic Segmentation (DGKD-WLSS), a novel framework that synergistically combines Diffusion-Guided Knowledge Distillation (DGKD) with Depth-Guided Feature Fusion (DGF2). DGKD aligns normal-light and low-light features via diffusion-based denoising and knowledge distillation, while DGF2 integrates depth maps as illumination-invariant geometric priors to enhance structural feature learning. Extensive experiments demonstrate the effectiveness of DGKD-WLSS, which achieves state-of-the-art performance in weakly supervised semantic segmentation tasks under low-light conditions. The source codes have been released at:https://github.com/ChunyanWang1/DGKD-WLSS.