HAT: Hybrid Attention Transformer for Image Restoration

๐Ÿ“… 2023-09-11
๐Ÿ›๏ธ arXiv.org
๐Ÿ“ˆ Citations: 69
โœจ Influential: 9
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Existing Transformer-based image restoration models suffer from narrow spatial receptive fields and weak inter-window feature interaction due to rigid local window partitioning. To address these limitations, this work proposes a hybrid attention architecture comprising three key innovations: (1) a novel cooperative modeling mechanism that jointly integrates channel-wise attention with window-based self-attention; (2) an overlapping cross-attention module explicitly designed to enhance inter-window feature communication; and (3) a same-task pretraining strategy to strengthen representation learning. Extensive experiments demonstrate state-of-the-art performance across multiple low-level vision tasksโ€”including synthetic and real-world super-resolution, Gaussian denoising, and compression artifact removal. Our method achieves significant PSNR and SSIM improvements over prior art and delivers superior visual quality with more natural textures and sharper details.
๐Ÿ“ Abstract
Transformer-based methods have shown impressive performance in image restoration tasks, such as image super-resolution and denoising. However, we find that these networks can only utilize a limited spatial range of input information through attribution analysis. This implies that the potential of Transformer is still not fully exploited in existing networks. In order to activate more input pixels for better restoration, we propose a new Hybrid Attention Transformer (HAT). It combines both channel attention and window-based self-attention schemes, thus making use of their complementary advantages. Moreover, to better aggregate the cross-window information, we introduce an overlapping cross-attention module to enhance the interaction between neighboring window features. In the training stage, we additionally adopt a same-task pre-training strategy to further exploit the potential of the model for further improvement. Extensive experiments have demonstrated the effectiveness of the proposed modules. We further scale up the model to show that the performance of the SR task can be greatly improved. Besides, we extend HAT to more image restoration applications, including real-world image super-resolution, Gaussian image denoising and image compression artifacts reduction. Experiments on benchmark and real-world datasets demonstrate that our HAT achieves state-of-the-art performance both quantitatively and qualitatively. Codes and models are publicly available at https://github.com/XPixelGroup/HAT.
Problem

Research questions and friction points this paper is trying to address.

Enhancing Transformer's ability to utilize broader spatial input information
Combining channel and window attention for improved image restoration
Improving cross-window feature interaction through overlapping attention modules
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid Attention Transformer combines channel and window attention
Overlapping cross-attention module enhances neighboring window interaction
Same-task pre-training strategy further improves model performance
๐Ÿ”Ž Similar Papers
No similar papers found.
X
Xiangyu Chen
State Key Laboratory of Internet of Things for Smart City, University of Macau
X
Xintao Wang
Applied Research Center, Tencent PCG, Shenzhen, China
W
Wenlong Zhang
Shanghai Artificial Intelligence Laboratory, Shanghai, China
Xiangtao Kong
Xiangtao Kong
The Hong Kong Polytechnic University
image restoration
Y
Y. Qiao
Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
Jiantao Zhou
Jiantao Zhou
Professor, Department of Computer and Information Science, University of Macau
Information Forensics and SecurityMultimedia Signal ProcessingMachine Learning
Chao Dong
Chao Dong
Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences
image restorationincluding super-resolutiondenoisingetc.