Multi-View Learning with Context-Guided Receptance for Image Denoising

📅 2025-05-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Addressing the challenges of modeling complex noise in real-scene image denoising and the high computational cost of Transformer-based methods, this paper proposes a lightweight and efficient multi-view denoising framework. Our approach introduces three key innovations: (1) a context-guided receptance mechanism enabling linear-complexity, full-pixel interaction; (2) context-guided token shifting (CTS) jointly with bidirectional WKV (BiWKV) to effectively capture long-range dependencies; and (3) a frequency-mixing (FMix) module integrating frequency-domain and spatial-domain modeling to enhance characterization of noise distributions. Extensive experiments on multiple real-world denoising benchmarks demonstrate that our method consistently outperforms state-of-the-art approaches, achieving average PSNR/SSIM gains of 1.2–2.4 dB and 0.015–0.028, respectively, while accelerating inference by up to 40%. Moreover, it exhibits superior detail recovery capability.

Technology Category

Application Category

📝 Abstract
Image denoising is essential in low-level vision applications such as photography and automated driving. Existing methods struggle with distinguishing complex noise patterns in real-world scenes and consume significant computational resources due to reliance on Transformer-based models. In this work, the Context-guided Receptance Weighted Key-Value (M) model is proposed, combining enhanced multi-view feature integration with efficient sequence modeling. Our approach introduces the Context-guided Token Shift (CTS) paradigm, which effectively captures local spatial dependencies and enhance the model's ability to model real-world noise distributions. Additionally, the Frequency Mix (FMix) module extracting frequency-domain features is designed to isolate noise in high-frequency spectra, and is integrated with spatial representations through a multi-view learning process. To improve computational efficiency, the Bidirectional WKV (BiWKV) mechanism is adopted, enabling full pixel-sequence interaction with linear complexity while overcoming the causal selection constraints. The model is validated on multiple real-world image denoising datasets, outperforming the existing state-of-the-art methods quantitatively and reducing inference time up to 40%. Qualitative results further demonstrate the ability of our model to restore fine details in various scenes.
Problem

Research questions and friction points this paper is trying to address.

Distinguish complex noise patterns in real-world images
Reduce computational cost of Transformer-based denoising models
Improve noise distribution modeling via multi-view learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Context-guided Token Shift captures spatial dependencies
Frequency Mix module isolates high-frequency noise
Bidirectional WKV enables linear complexity interactions
🔎 Similar Papers
No similar papers found.
Binghong Chen
Binghong Chen
School of Mathematics, Harbin Institute of Technology, China
Tingting Chai
Tingting Chai
Harbin Institute of Technology
BiometricsPattern RecognitionMachine Learning
W
Wei Jiang
School of Mathematics, Harbin Institute of Technology, China
Y
Yuanrong Xu
Faculty of Computing, Harbin Institute of Technology, China
Guanglu Zhou
Guanglu Zhou
Faculty of Computing, Harbin Institute of Technology, China
X
Xiangqian Wu
Faculty of Computing, Harbin Institute of Technology, China