🤖 AI Summary
Addressing the challenges of modeling complex noise in real-scene image denoising and the high computational cost of Transformer-based methods, this paper proposes a lightweight and efficient multi-view denoising framework. Our approach introduces three key innovations: (1) a context-guided receptance mechanism enabling linear-complexity, full-pixel interaction; (2) context-guided token shifting (CTS) jointly with bidirectional WKV (BiWKV) to effectively capture long-range dependencies; and (3) a frequency-mixing (FMix) module integrating frequency-domain and spatial-domain modeling to enhance characterization of noise distributions. Extensive experiments on multiple real-world denoising benchmarks demonstrate that our method consistently outperforms state-of-the-art approaches, achieving average PSNR/SSIM gains of 1.2–2.4 dB and 0.015–0.028, respectively, while accelerating inference by up to 40%. Moreover, it exhibits superior detail recovery capability.
📝 Abstract
Image denoising is essential in low-level vision applications such as photography and automated driving. Existing methods struggle with distinguishing complex noise patterns in real-world scenes and consume significant computational resources due to reliance on Transformer-based models. In this work, the Context-guided Receptance Weighted Key-Value (M) model is proposed, combining enhanced multi-view feature integration with efficient sequence modeling. Our approach introduces the Context-guided Token Shift (CTS) paradigm, which effectively captures local spatial dependencies and enhance the model's ability to model real-world noise distributions. Additionally, the Frequency Mix (FMix) module extracting frequency-domain features is designed to isolate noise in high-frequency spectra, and is integrated with spatial representations through a multi-view learning process. To improve computational efficiency, the Bidirectional WKV (BiWKV) mechanism is adopted, enabling full pixel-sequence interaction with linear complexity while overcoming the causal selection constraints. The model is validated on multiple real-world image denoising datasets, outperforming the existing state-of-the-art methods quantitatively and reducing inference time up to 40%. Qualitative results further demonstrate the ability of our model to restore fine details in various scenes.