Directing the Narrative: A Finetuning Method for Controlling Coherence and Style in Story Generation

📅 2026-03-17

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the challenges of character identity inconsistency and visual style drift in story generation by proposing a two-stage optimization framework. First, it introduces a Group-Shared Attention (GSA) mechanism that enables lossless cross-frame identity modeling within the attention layers, preserving character consistency without requiring external encoders. Second, it integrates Direct Preference Optimization (DPO) to jointly enhance visual fidelity and narrative coherence through alignment with human preferences. Notably, this is the first approach to leverage holistic preference learning for the collaborative optimization of identity and style. Evaluated on the ViStoryBench benchmark, the method achieves a new state of the art, improving Character Identity Consistency (CIDS) by 10.0 and Character Style Consistency (CSD) by 18.7 while maintaining high generation quality.

Technology Category

Application Category

📝 Abstract

Story visualization requires generating sequential imagery that aligns semantically with evolving narratives while maintaining rigorous consistency in character identity and visual style. However, existing methodologies often struggle with subject inconsistency and identity drift, particularly when depicting complex interactions or extended narrative arcs. To address these challenges, we propose a cohesive two-stage framework designed for robust and consistent story generation. First, we introduce Group-Shared Attention (GSA), a mechanism that fosters intrinsic consistency by enabling lossless cross-sample information flow within attention layers. This allows the model to structurally encode identity correspondence across frames without relying on external encoders. Second, we leverage Direct Preference Optimization (DPO) to align generated outputs with human aesthetic and narrative standards. Unlike conventional methods that rely on conflicting auxiliary losses, our approach simultaneously enhances visual fidelity and identity preservation by learning from holistic preference data. Extensive evaluations on the ViStoryBench benchmark demonstrate that our method establishes a new state-of-the-art, significantly outperforming strong baselines with gains of +10.0 in Character Identity (CIDS) and +18.7 in Style Consistency (CSD), all while preserving high-fidelity generation.

Problem

Research questions and friction points this paper is trying to address.

story visualization

character identity consistency

visual style consistency

identity drift

narrative coherence

Innovation

Methods, ideas, or system contributions that make the work stand out.

Group-Shared Attention

Direct Preference Optimization

story visualization

identity consistency

style coherence

🔎 Similar Papers

Agents' Room: Narrative Generation through Multi-step Collaboration

2024-10-03arXiv.orgCitations: 4

Generating Visual Stories with Grounded and Coreferent Characters

2024-09-20arXiv.orgCitations: 0

Authors to Follow