🤖 AI Summary
Low-light nighttime images suffer from severely degraded visibility, hindering content perception; existing methods are limited by poor interpretability (data-driven) or oversimplified physical assumptions (model-driven). To address this, we propose the first discrete representation enhancement framework integrating causal inference with vector quantization (VQ): (i) a high-quality visual dictionary is constructed as a reliable prior; (ii) a dual-level causal intervention module—operating at both pixel and feature levels—corrects distributional shifts between degraded images and the dictionary; (iii) a low-frequency selective attention gating (LSAG) mechanism and a high-frequency detail reconstruction module (HDRM) are introduced to enhance robustness and physical consistency under extreme low-light conditions. Extensive experiments demonstrate significant improvements over state-of-the-art methods across multiple benchmarks, achieving superior visual quality, enhanced downstream task performance (e.g., detection and segmentation), and strong generalization capability.
📝 Abstract
Images captured in nighttime scenes suffer from severely reduced visibility, hindering effective content perception. Current low-light image enhancement (LLIE) methods face significant challenges: data-driven end-to-end mapping networks lack interpretability or rely on unreliable prior guidance, struggling under extremely dark conditions, while physics-based methods depend on simplified assumptions that often fail in complex real-world scenarios. To address these limitations, we propose CIVQLLIE, a novel framework that leverages the power of discrete representation learning through causal reasoning. We achieve this through Vector Quantization (VQ), which maps continuous image features to a discrete codebook of visual tokens learned from large-scale high-quality images. This codebook serves as a reliable prior, encoding standardized brightness and color patterns that are independent of degradation. However, direct application of VQ to low-light images fails due to distribution shifts between degraded inputs and the learned codebook. Therefore, we propose a multi-level causal intervention approach to systematically correct these shifts. First, during encoding, our Pixel-level Causal Intervention (PCI) module intervenes to align low-level features with the brightness and color distributions expected by the codebook. Second, a Feature-aware Causal Intervention (FCI) mechanism with Low-frequency Selective Attention Gating (LSAG) identifies and enhances channels most affected by illumination degradation, facilitating accurate codebook token matching while enhancing the encoder's generalization performance through flexible feature-level intervention. Finally, during decoding, the High-frequency Detail Reconstruction Module (HDRM) leverages structural information preserved in the matched codebook representations to reconstruct fine details using deformable convolution techniques.