Serpent: Scalable and Efficient Image Restoration via Multi-scale Structured State Space Models

📅 2024-03-26

🏛️ arXiv.org

📈 Citations: 4

✨ Influential: 0

🤖 AI Summary

To address the limitations of convolutional models in capturing long-range dependencies and the high computational cost of attention mechanisms in high-resolution image inpainting, this paper pioneers the integration of Structured State Space Models (SSMs) into image restoration. We propose a multi-scale serialized hierarchical architecture that achieves both global receptive fields and linear computational complexity through multi-scale signal decomposition and serialized image representation. A hierarchical feature aggregation mechanism is designed to enable resolution-scalable, lightweight reconstruction. Experiments demonstrate that our method matches state-of-the-art (SOTA) performance in reconstruction quality while reducing FLOPs by up to 150× and GPU memory consumption by 5× on high-resolution inputs. This yields substantial improvements in inference efficiency and deployment friendliness without compromising fidelity.

Technology Category

Application Category

📝 Abstract

The landscape of computational building blocks of efficient image restoration architectures is dominated by a combination of convolutional processing and various attention mechanisms. However, convolutional filters, while efficient, are inherently local and therefore struggle with modeling long-range dependencies in images. In contrast, attention excels at capturing global interactions between arbitrary image regions, but suffers from a quadratic cost in image dimension. In this work, we propose Serpent, an efficient architecture for high-resolution image restoration that combines recent advances in state space models (SSMs) with multi-scale signal processing in its core computational block. SSMs, originally introduced for sequence modeling, can maintain a global receptive field with a favorable linear scaling in input size. We propose a novel hierarchical architecture inspired by traditional signal processing principles, that converts the input image into a collection of sequences and processes them in a multi-scale fashion. Our experimental results demonstrate that Serpent can achieve reconstruction quality on par with state-of-the-art techniques, while requiring orders of magnitude less compute (up to $150$ fold reduction in FLOPS) and a factor of up to $5 imes$ less GPU memory while maintaining a compact model size. The efficiency gains achieved by Serpent are especially notable at high image resolutions.

Problem

Research questions and friction points this paper is trying to address.

Image Restoration

Convolution Methods

Attention Mechanism

Innovation

Methods, ideas, or system contributions that make the work stand out.

Serpent

Multi-scale Signal Processing

Efficient Image Restoration

🔎 Similar Papers

No similar papers found.

Authors to Follow