Ultra High-Resolution Image Inpainting with Patch-Based Content Consistency Adapter

📅 2025-10-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address global structural distortion, local texture degradation, and poor text-prompt alignment in ultra-high-resolution (4K+) image inpainting, this paper proposes Patch-Adapter—a dual-stage adapter framework that operates without modifying pre-trained diffusion models. The method decouples global semantic coherence from local detail fidelity: Stage I employs dual-context adapters to learn coherence from downsampled features; Stage II introduces reference-image patch attention to enable adaptive, full-resolution patch-level feature fusion. Evaluated on OpenImages and Photo-Concept-Bucket, Patch-Adapter achieves state-of-the-art performance, significantly suppressing large-area inpainting artifacts while improving perceptual quality and text–image alignment accuracy.

Technology Category

Application Category

📝 Abstract
In this work, we present Patch-Adapter, an effective framework for high-resolution text-guided image inpainting. Unlike existing methods limited to lower resolutions, our approach achieves 4K+ resolution while maintaining precise content consistency and prompt alignment, two critical challenges in image inpainting that intensify with increasing resolution and texture complexity. Patch-Adapter leverages a two-stage adapter architecture to scale the diffusion model's resolution from 1K to 4K+ without requiring structural overhauls: (1) Dual Context Adapter learns coherence between masked and unmasked regions at reduced resolutions to establish global structural consistency; and (2) Reference Patch Adapter implements a patch-level attention mechanism for full-resolution inpainting, preserving local detail fidelity through adaptive feature fusion. This dual-stage architecture uniquely addresses the scalability gap in high-resolution inpainting by decoupling global semantics from localized refinement. Experiments demonstrate that Patch-Adapter not only resolves artifacts common in large-scale inpainting but also achieves state-of-the-art performance on the OpenImages and Photo-Concept-Bucket datasets, outperforming existing methods in both perceptual quality and text-prompt adherence.
Problem

Research questions and friction points this paper is trying to address.

Achieving 4K+ resolution in text-guided image inpainting
Maintaining content consistency with increasing texture complexity
Preserving local detail fidelity while scaling diffusion models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Two-stage adapter architecture scales diffusion model resolution
Dual Context Adapter ensures global structural consistency
Reference Patch Adapter preserves local detail fidelity
🔎 Similar Papers
No similar papers found.
Jianhui Zhang
Jianhui Zhang
University of Electronic Science and Technology of China
S
Sheng Cheng
Megvii Technology
Q
Qirui Sun
Dzine AI, SeeKoo
J
Jia Liu
University of Electronic Science and Technology of China
W
Wang Luyang
C
Chaoyu Feng
Chen Fang
Chen Fang
Research Scientist@Adobe Research
Computer VisionMachine Learning
L
Lei Lei
J
Jue Wang
Dzine AI, SeeKoo
Shuaicheng Liu
Shuaicheng Liu
University of Electronic Science and Technology of China
Computer VisionComputational Photography