DiNAT-IR: Exploring Dilated Neighborhood Attention for High-Quality Image Restoration

📅 2025-07-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In high-resolution image restoration, simultaneously modeling global contextual dependencies and preserving local details remains challenging, while conventional self-attention mechanisms suffer from prohibitive computational complexity. To address this, we propose Dilated Neighborhood Attention (DiNA), a channel-aware attention mechanism that employs multi-scale dilated sliding windows for fine-grained local modeling and channel-wise global context aggregation. DiNA significantly reduces computational overhead while enhancing long-range dependency capture. Integrated into a lightweight Transformer architecture, it synergistically combines channel-wise attention, dynamic sliding windows, and a hybrid dilation strategy, augmented by a channel-aware module for adaptive multi-scale feature fusion. Evaluated on multiple image deblurring and restoration benchmarks, DiNA achieves state-of-the-art or near-state-of-the-art performance with substantially fewer parameters and lower FLOPs, thereby unifying high-fidelity reconstruction with high inference efficiency.

Technology Category

Application Category

📝 Abstract
Transformers, with their self-attention mechanisms for modeling long-range dependencies, have become a dominant paradigm in image restoration tasks. However, the high computational cost of self-attention limits scalability to high-resolution images, making efficiency-quality trade-offs a key research focus. To address this, Restormer employs channel-wise self-attention, which computes attention across channels instead of spatial dimensions. While effective, this approach may overlook localized artifacts that are crucial for high-quality image restoration. To bridge this gap, we explore Dilated Neighborhood Attention (DiNA) as a promising alternative, inspired by its success in high-level vision tasks. DiNA balances global context and local precision by integrating sliding-window attention with mixed dilation factors, effectively expanding the receptive field without excessive overhead. However, our preliminary experiments indicate that directly applying this global-local design to the classic deblurring task hinders accurate visual restoration, primarily due to the constrained global context understanding within local attention. To address this, we introduce a channel-aware module that complements local attention, effectively integrating global context without sacrificing pixel-level precision. The proposed DiNAT-IR, a Transformer-based architecture specifically designed for image restoration, achieves competitive results across multiple benchmarks, offering a high-quality solution for diverse low-level computer vision problems.
Problem

Research questions and friction points this paper is trying to address.

High computational cost limits self-attention in high-resolution images
Channel-wise self-attention overlooks localized artifacts in restoration
Direct global-local design hinders accurate visual restoration in deblurring
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dilated Neighborhood Attention balances global-local context
Channel-aware module enhances global context integration
DiNAT-IR optimizes efficiency-quality trade-offs effectively
🔎 Similar Papers
No similar papers found.