Plug-and-Play Linear Attention for Pre-trained Image and Video Restoration Models

📅 2025-06-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Multi-head self-attention (MHSA) in vision restoration models suffers from quadratic computational complexity, hindering real-time and resource-constrained deployment. This paper proposes PnP-Nystra—a linear self-attention module based on Nyström low-rank approximation—designed as a training-agnostic, plug-and-play replacement for MHSA. Without any fine-tuning, PnP-Nystra seamlessly integrates into pre-trained restoration models including SwinIR, Uformer, and RVRT. Its design combines window-based adaptation with a zero-shot deployment mechanism, achieving 2–5× acceleration on RTX 4090 and CPU while incurring ≤1.5 dB PSNR degradation across image/video denoising, deblurring, and super-resolution tasks. The core contribution is the first empirical validation of linear attention’s generalizability and engineering practicality in frozen, pre-trained vision restoration architectures.

Technology Category

Application Category

📝 Abstract
Multi-head self-attention (MHSA) has become a core component in modern computer vision models. However, its quadratic complexity with respect to input length poses a significant computational bottleneck in real-time and resource constrained environments. We propose PnP-Nystra, a Nystr""om based linear approximation of self-attention, developed as a plug-and-play (PnP) module that can be integrated into the pre-trained image and video restoration models without retraining. As a drop-in replacement for MHSA, PnP-Nystra enables efficient acceleration in various window-based transformer architectures, including SwinIR, Uformer, and RVRT. Our experiments across diverse image and video restoration tasks, including denoising, deblurring, and super-resolution, demonstrate that PnP-Nystra achieves a 2-4x speed-up on an NVIDIA RTX 4090 GPU and a 2-5x speed-up on CPU inference. Despite these significant gains, the method incurs a maximum PSNR drop of only 1.5 dB across all evaluated tasks. To the best of our knowledge, we are the first to demonstrate a linear attention functioning as a training-free substitute for MHSA in restoration models.
Problem

Research questions and friction points this paper is trying to address.

Reduces computational complexity of multi-head self-attention
Enables efficient acceleration without model retraining
Maintains performance with minimal PSNR drop
Innovation

Methods, ideas, or system contributions that make the work stand out.

Nyström-based linear self-attention approximation
Plug-and-play module for pre-trained models
Training-free acceleration for transformer architectures
S
Srinivasan Kidambi
Department of Electrical Engineering, Indian Institute of Technology, Madras 600036, India
Pravin Nair
Pravin Nair
Assistant Professor, Dept of Electrical Engineering, Indian Institute of Technology, Madras
Computer VisionMachine LearningImage processingOptimization