Ultra High-Resolution Image Inpainting with Patch-Based Content Consistency Adapter

📅 2025-10-15

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

To address global structural distortion, local texture degradation, and poor text-prompt alignment in ultra-high-resolution (4K+) image inpainting, this paper proposes Patch-Adapter—a dual-stage adapter framework that operates without modifying pre-trained diffusion models. The method decouples global semantic coherence from local detail fidelity: Stage I employs dual-context adapters to learn coherence from downsampled features; Stage II introduces reference-image patch attention to enable adaptive, full-resolution patch-level feature fusion. Evaluated on OpenImages and Photo-Concept-Bucket, Patch-Adapter achieves state-of-the-art performance, significantly suppressing large-area inpainting artifacts while improving perceptual quality and text–image alignment accuracy.

Technology Category

Application Category

📝 Abstract

In this work, we present Patch-Adapter, an effective framework for high-resolution text-guided image inpainting. Unlike existing methods limited to lower resolutions, our approach achieves 4K+ resolution while maintaining precise content consistency and prompt alignment, two critical challenges in image inpainting that intensify with increasing resolution and texture complexity. Patch-Adapter leverages a two-stage adapter architecture to scale the diffusion model's resolution from 1K to 4K+ without requiring structural overhauls: (1) Dual Context Adapter learns coherence between masked and unmasked regions at reduced resolutions to establish global structural consistency; and (2) Reference Patch Adapter implements a patch-level attention mechanism for full-resolution inpainting, preserving local detail fidelity through adaptive feature fusion. This dual-stage architecture uniquely addresses the scalability gap in high-resolution inpainting by decoupling global semantics from localized refinement. Experiments demonstrate that Patch-Adapter not only resolves artifacts common in large-scale inpainting but also achieves state-of-the-art performance on the OpenImages and Photo-Concept-Bucket datasets, outperforming existing methods in both perceptual quality and text-prompt adherence.

Problem

Research questions and friction points this paper is trying to address.

Achieving 4K+ resolution in text-guided image inpainting

Maintaining content consistency with increasing texture complexity

Preserving local detail fidelity while scaling diffusion models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Two-stage adapter architecture scales diffusion model resolution

Dual Context Adapter ensures global structural consistency

Reference Patch Adapter preserves local detail fidelity

🔎 Similar Papers

No similar papers found.

Authors to Follow