DP-Adapter: Dual-Pathway Adapter for Boosting Fidelity and Text Consistency in Customizable Human Image Generation

📅 2025-02-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In personalized portrait generation, balancing identity fidelity and text-prompt alignment remains challenging due to mutual interference between visual and textual modalities during joint modeling. To address this, we propose a region-decoupled dual-path adapter framework: the input image is partitioned into vision-sensitive regions (emphasizing identity-critical details) and text-sensitive regions (emphasizing semantic control); corresponding Identity Enhancement Adapters (IEA) and Text Consistency Adapters (TCA) are deployed per region. A fine-grained feature-level fusion block (FFB) enables synergistic optimization. Built upon diffusion-model-based adapter fine-tuning, our method supports novel tasks including portrait extrapolation and attribute editing (e.g., age/expression). Evaluated on multiple customized portrait benchmarks, our approach achieves a 12.6% improvement in ID preservation rate and a 9.3% gain in CLIP text-alignment score, while attaining state-of-the-art performance in naturalness and controllability.

Technology Category

Application Category

📝 Abstract
With the growing popularity of personalized human content creation and sharing, there is a rising demand for advanced techniques in customized human image generation. However, current methods struggle to simultaneously maintain the fidelity of human identity and ensure the consistency of textual prompts, often resulting in suboptimal outcomes. This shortcoming is primarily due to the lack of effective constraints during the simultaneous integration of visual and textual prompts, leading to unhealthy mutual interference that compromises the full expression of both types of input. Building on prior research that suggests visual and textual conditions influence different regions of an image in distinct ways, we introduce a novel Dual-Pathway Adapter (DP-Adapter) to enhance both high-fidelity identity preservation and textual consistency in personalized human image generation. Our approach begins by decoupling the target human image into visually sensitive and text-sensitive regions. For visually sensitive regions, DP-Adapter employs an Identity-Enhancing Adapter (IEA) to preserve detailed identity features. For text-sensitive regions, we introduce a Textual-Consistency Adapter (TCA) to minimize visual interference and ensure the consistency of textual semantics. To seamlessly integrate these pathways, we develop a Fine-Grained Feature-Level Blending (FFB) module that efficiently combines hierarchical semantic features from both pathways, resulting in more natural and coherent synthesis outcomes. Additionally, DP-Adapter supports various innovative applications, including controllable headshot-to-full-body portrait generation, age editing, old-photo to reality, and expression editing.
Problem

Research questions and friction points this paper is trying to address.

Enhancing identity fidelity in human image generation
Ensuring textual prompt consistency in image synthesis
Minimizing visual-textual interference in personalized content creation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dual-Pathway Adapter enhances fidelity
Textual-Consistency Adapter ensures text consistency
Fine-Grained Feature-Level Blending integrates pathways
🔎 Similar Papers
No similar papers found.
Y
Ye Wang
School of Artificial Intelligence, Jilin University
Xuping Xie
Xuping Xie
Old Dominion University
Machine LearningReduced Order ModelingFluid DynamicsSignal Processing
L
Lanjun Wang
School of New Media and Communication, Tianjin University
Z
Zili Yi
School of Intelligence Science and Technology, Nanjing University
R
Rui Ma
School of Artificial Intelligence, Jilin University; Engineering Research Center of Knowledge-Driven Human-Machine Intelligence, MOE, China