CharCom: Composable Identity Control for Multi-Character Story Illustration

📅 2025-10-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Multi-character identity inconsistency remains a critical challenge in text-to-image generation, particularly in complex narrative scenes. Method: This paper proposes a composable LoRA-based character control framework that freezes the base diffusion model and employs a modular, parameter-efficient LoRA architecture. It integrates prompt-aware dynamic scheduling and feature fusion to enable on-the-fly composition of character controllers at inference time—without fine-tuning the foundation model. Contribution/Results: To our knowledge, this is the first approach achieving semantic alignment, temporal coherence, and scalable multi-character personalization. Experiments demonstrate significant improvements in character consistency, semantic fidelity, and cross-image coherence—especially in crowded or intricate scenes—while maintaining high visual quality and computational efficiency, thus exhibiting strong potential for practical deployment.

Technology Category

Application Category

📝 Abstract
Ensuring character identity consistency across varying prompts remains a fundamental limitation in diffusion-based text-to-image generation. We propose CharCom, a modular and parameter-efficient framework that achieves character-consistent story illustration through composable LoRA adapters, enabling efficient per-character customization without retraining the base model. Built on a frozen diffusion backbone, CharCom dynamically composes adapters at inference using prompt-aware control. Experiments on multi-scene narratives demonstrate that CharCom significantly enhances character fidelity, semantic alignment, and temporal coherence. It remains robust in crowded scenes and enables scalable multi-character generation with minimal overhead, making it well-suited for real-world applications such as story illustration and animation.
Problem

Research questions and friction points this paper is trying to address.

Ensuring character identity consistency in diffusion-based text-to-image generation
Achieving character-consistent story illustration without retraining base models
Enabling scalable multi-character generation with minimal computational overhead
Innovation

Methods, ideas, or system contributions that make the work stand out.

Modular framework using composable LoRA adapters
Dynamic adapter composition with prompt-aware control
Enables scalable multi-character generation without retraining
🔎 Similar Papers
No similar papers found.
Zhongsheng Wang
Zhongsheng Wang
University of Auckland
Large Language ModelAI Agents
M
Ming Lin
University of Auckland, Auckland, New Zealand
Z
Zhedong Lin
University of Auckland, Auckland, New Zealand
Y
Yaser Shakib
Bedaia.ai, Auckland, New Zealand
Q
Qian Liu
University of Auckland, Auckland, New Zealand
Jiamou Liu
Jiamou Liu
The University of Auckland
Social NetworksArtificial IntelligenceMachine Learning