Plug-and-Play Linear Attention for Pre-trained Image and Video Restoration Models

📅 2025-06-10

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Multi-head self-attention (MHSA) in vision restoration models suffers from quadratic computational complexity, hindering real-time and resource-constrained deployment. This paper proposes PnP-Nystra—a linear self-attention module based on Nyström low-rank approximation—designed as a training-agnostic, plug-and-play replacement for MHSA. Without any fine-tuning, PnP-Nystra seamlessly integrates into pre-trained restoration models including SwinIR, Uformer, and RVRT. Its design combines window-based adaptation with a zero-shot deployment mechanism, achieving 2–5× acceleration on RTX 4090 and CPU while incurring ≤1.5 dB PSNR degradation across image/video denoising, deblurring, and super-resolution tasks. The core contribution is the first empirical validation of linear attention’s generalizability and engineering practicality in frozen, pre-trained vision restoration architectures.

Technology Category

Application Category

📝 Abstract

Multi-head self-attention (MHSA) has become a core component in modern computer vision models. However, its quadratic complexity with respect to input length poses a significant computational bottleneck in real-time and resource constrained environments. We propose PnP-Nystra, a Nystr""om based linear approximation of self-attention, developed as a plug-and-play (PnP) module that can be integrated into the pre-trained image and video restoration models without retraining. As a drop-in replacement for MHSA, PnP-Nystra enables efficient acceleration in various window-based transformer architectures, including SwinIR, Uformer, and RVRT. Our experiments across diverse image and video restoration tasks, including denoising, deblurring, and super-resolution, demonstrate that PnP-Nystra achieves a 2-4x speed-up on an NVIDIA RTX 4090 GPU and a 2-5x speed-up on CPU inference. Despite these significant gains, the method incurs a maximum PSNR drop of only 1.5 dB across all evaluated tasks. To the best of our knowledge, we are the first to demonstrate a linear attention functioning as a training-free substitute for MHSA in restoration models.

Problem

Research questions and friction points this paper is trying to address.

Reduces computational complexity of multi-head self-attention

Enables efficient acceleration without model retraining

Maintains performance with minimal PSNR drop

Innovation

Methods, ideas, or system contributions that make the work stand out.

Nyström-based linear self-attention approximation

Plug-and-play module for pre-trained models

Training-free acceleration for transformer architectures

🔎 Similar Papers

DiffIR2VR-Zero: Zero-Shot Video Restoration with Diffusion-based Image Restoration Models

2024-07-01arXiv.orgCitations: 4

Zero-shot Video Restoration and Enhancement Using Pre-Trained Image Diffusion Model

2024-07-02arXiv.orgCitations: 0

Authors to Follow