TransRef: Multi-Scale Reference Embedding Transformer for Reference-Guided Image Inpainting

📅 2023-06-20
🏛️ arXiv.org
📈 Citations: 4
Influential: 0
📄 PDF
🤖 AI Summary
To address image inpainting under complex semantic scenes and diverse hole patterns, this paper proposes a reference-guided progressive multi-scale Transformer framework. Methodologically, we introduce two novel modules—Reference Patch Alignment (Ref-PA) and Reference Patch Transformer (Ref-PT)—enabling precise cross-image feature matching, structural-textural transfer, and style-adaptive harmonization; additionally, we incorporate learnable reference patch matching and multi-scale feature fusion. Our contributions include: (1) constructing RefIR-50K, the first large-scale public benchmark dataset comprising 50,000 reference-inpainting pairs; (2) achieving state-of-the-art performance on challenging irregular hole inpainting, with significant PSNR/SSIM improvements and more natural visual quality; and (3) releasing both source code and the dataset publicly.
📝 Abstract
Image inpainting for completing complicated semantic environments and diverse hole patterns of corrupted images is challenging even for state-of-the-art learning-based inpainting methods trained on large-scale data. A reference image capturing the same scene of a corrupted image offers informative guidance for completing the corrupted image as it shares similar texture and structure priors to that of the holes of the corrupted image. In this work, we propose a transformer-based encoder-decoder network, named TransRef, for reference-guided image inpainting. Specifically, the guidance is conducted progressively through a reference embedding procedure, in which the referencing features are subsequently aligned and fused with the features of the corrupted image. For precise utilization of the reference features for guidance, a reference-patch alignment (Ref-PA) module is proposed to align the patch features of the reference and corrupted images and harmonize their style differences, while a reference-patch transformer (Ref-PT) module is proposed to refine the embedded reference feature. Moreover, to facilitate the research of reference-guided image restoration tasks, we construct a publicly accessible benchmark dataset containing 50K pairs of input and reference images. Both quantitative and qualitative evaluations demonstrate the efficacy of the reference information and the proposed method over the state-of-the-art methods in completing complex holes. Code and dataset can be accessed at https://github.com/Cameltr/TransRef.
Problem

Research questions and friction points this paper is trying to address.

Challenges in complex image inpainting
Reference-guided image completion method
Alignment and fusion of reference features
Innovation

Methods, ideas, or system contributions that make the work stand out.

Transformer-based encoder-decoder network
Reference-patch alignment module
Reference-patch transformer module
🔎 Similar Papers
No similar papers found.
L
Liang Liao
School of Computer Science and Engineering, Nanyang Technological University, 50 Nanyang Avenue, Singapore 639798
Taorong Liu
Taorong Liu
School of Computer Science, Wuhan University, Wuhan 430072, China
Delin Chen
Delin Chen
HKU
MultimodalAgent
J
Jing Xiao
School of Computer Science, Wuhan University, Wuhan 430072, China
Z
Zheng Wang
School of Computer Science, Wuhan University, Wuhan 430072, China
C
Chia-Wen Lin
Department of Electrical Engineering and the Institute of Communications Engineering, National Tsing Hua University, Hsinchu 30013, Taiwan