An Efficient and Mixed Heterogeneous Model for Image Restoration

📅 2025-04-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge of efficiently integrating heterogeneous architectures (CNNs, Transformers, and Mamba) in general image restoration, this paper proposes RestorMixer—a three-stage encoder-decoder hybrid model. It introduces a stage-wise heterogeneous fusion paradigm: (i) high-resolution stages employ CNNs for fine-grained local feature extraction; (ii) mid-to-low-resolution stages jointly optimize bidirectional-scanned Mamba modules—enhancing long-range dependency modeling—and multi-scale windowed self-attention—improving dynamic representation adaptability—augmented with a resolution-adaptive mechanism. Extensive experiments demonstrate that RestorMixer achieves state-of-the-art performance across diverse image restoration tasks, including denoising, deblurring, and super-resolution. Notably, it significantly outperforms pure Transformer- or Mamba-based baselines in inference speed while maintaining superior accuracy, thereby striking a more favorable trade-off between precision and computational efficiency.

Technology Category

Application Category

📝 Abstract
Image restoration~(IR), as a fundamental multimedia data processing task, has a significant impact on downstream visual applications. In recent years, researchers have focused on developing general-purpose IR models capable of handling diverse degradation types, thereby reducing the cost and complexity of model development. Current mainstream approaches are based on three architectural paradigms: CNNs, Transformers, and Mambas. CNNs excel in efficient inference, whereas Transformers and Mamba excel at capturing long-range dependencies and modeling global contexts. While each architecture has demonstrated success in specialized, single-task settings, limited efforts have been made to effectively integrate heterogeneous architectures to jointly address diverse IR challenges. To bridge this gap, we propose RestorMixer, an efficient and general-purpose IR model based on mixed-architecture fusion. RestorMixer adopts a three-stage encoder-decoder structure, where each stage is tailored to the resolution and feature characteristics of the input. In the initial high-resolution stage, CNN-based blocks are employed to rapidly extract shallow local features. In the subsequent stages, we integrate a refined multi-directional scanning Mamba module with a multi-scale window-based self-attention mechanism. This hierarchical and adaptive design enables the model to leverage the strengths of CNNs in local feature extraction, Mamba in global context modeling, and attention mechanisms in dynamic feature refinement. Extensive experimental results demonstrate that RestorMixer achieves leading performance across multiple IR tasks while maintaining high inference efficiency. The official code can be accessed at https://github.com/ClimBin/RestorMixer.
Problem

Research questions and friction points this paper is trying to address.

Develop general-purpose image restoration model for diverse degradations
Integrate CNNs, Transformers, Mambas to address heterogeneous IR challenges
Balance local feature extraction and global context modeling efficiently
Innovation

Methods, ideas, or system contributions that make the work stand out.

Mixed CNN, Transformer, Mamba architectures
Hierarchical adaptive encoder-decoder design
Multi-directional Mamba with window attention
🔎 Similar Papers
2024-09-16Philosophical transactions. Series A, Mathematical, physical, and engineering sciencesCitations: 8
Yubin Gu
Yubin Gu
Ph.D. Candidate, Xiamen University
Multi-Modal LearningLow-Level VisionComputer Vision
Y
Yuan Meng
MAC Lab, Xiamen University, China
K
Kaihang Zheng
MAC Lab, Xiamen University, China
X
Xiaoshuai Sun
MAC Lab, Xiamen University, China
Jiayi Ji
Jiayi Ji
Rutgers University
Weijian Ruan
Weijian Ruan
President of R&D, Smart City Research Institute, CETC
Artificial IntelligenceComputer VisionMultimedia Analysis
L
Liujuan Cao
MAC Lab, Xiamen University, China
R
Rongrong Ji
MAC Lab, Xiamen University, China