Directing the Narrative: A Finetuning Method for Controlling Coherence and Style in Story Generation

📅 2026-03-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenges of character identity inconsistency and visual style drift in story generation by proposing a two-stage optimization framework. First, it introduces a Group-Shared Attention (GSA) mechanism that enables lossless cross-frame identity modeling within the attention layers, preserving character consistency without requiring external encoders. Second, it integrates Direct Preference Optimization (DPO) to jointly enhance visual fidelity and narrative coherence through alignment with human preferences. Notably, this is the first approach to leverage holistic preference learning for the collaborative optimization of identity and style. Evaluated on the ViStoryBench benchmark, the method achieves a new state of the art, improving Character Identity Consistency (CIDS) by 10.0 and Character Style Consistency (CSD) by 18.7 while maintaining high generation quality.

Technology Category

Application Category

📝 Abstract
Story visualization requires generating sequential imagery that aligns semantically with evolving narratives while maintaining rigorous consistency in character identity and visual style. However, existing methodologies often struggle with subject inconsistency and identity drift, particularly when depicting complex interactions or extended narrative arcs. To address these challenges, we propose a cohesive two-stage framework designed for robust and consistent story generation. First, we introduce Group-Shared Attention (GSA), a mechanism that fosters intrinsic consistency by enabling lossless cross-sample information flow within attention layers. This allows the model to structurally encode identity correspondence across frames without relying on external encoders. Second, we leverage Direct Preference Optimization (DPO) to align generated outputs with human aesthetic and narrative standards. Unlike conventional methods that rely on conflicting auxiliary losses, our approach simultaneously enhances visual fidelity and identity preservation by learning from holistic preference data. Extensive evaluations on the ViStoryBench benchmark demonstrate that our method establishes a new state-of-the-art, significantly outperforming strong baselines with gains of +10.0 in Character Identity (CIDS) and +18.7 in Style Consistency (CSD), all while preserving high-fidelity generation.
Problem

Research questions and friction points this paper is trying to address.

story visualization
character identity consistency
visual style consistency
identity drift
narrative coherence
Innovation

Methods, ideas, or system contributions that make the work stand out.

Group-Shared Attention
Direct Preference Optimization
story visualization
identity consistency
style coherence
J
Jianzhang Zhang
Department of Management Science and Engineering, Hangzhou Normal University, Hangzhou, Zhejiang, P.R. China
Y
Yijing Tian
Department of Management Science and Engineering, Hangzhou Normal University, Hangzhou, Zhejiang, P.R. China
J
Jiwang Qu
Department of Management Science and Engineering, Hangzhou Normal University, Hangzhou, Zhejiang, P.R. China
Chuang Liu
Chuang Liu
Hangzhou Normal Univerisy
complex networkbioinformatics