ViStoryBench: Comprehensive Benchmark Suite for Story Visualization

๐Ÿ“… 2025-05-30
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
To address the lack of standardized evaluation benchmarks for story visualization, this paper introduces ViStoryBenchโ€”the first comprehensive benchmark for assessing narrative image generation. It encompasses diverse narrative genres (e.g., comedy, horror), artistic styles (e.g., animation, 3D rendering), single/multi-character configurations, and complex world-building scenarios, evaluating models along three core dimensions: narrative consistency, character coherence, and visual aesthetics. Methodologically, ViStoryBench proposes a novel, balanced, fine-grained evaluation framework integrating semantic alignment, visual consistency metrics, character ID stability analysis, and a hybrid human-automated assessment protocol. Experimental results demonstrate that ViStoryBench effectively diagnoses narrative logical flaws and visual discontinuities in long-sequence image generation by state-of-the-art models, significantly enhancing evaluation comparability and interpretability. This work fills a critical gap by establishing the first standardized, multidimensional benchmark for story visualization.

Technology Category

Application Category

๐Ÿ“ Abstract
Story visualization, which aims to generate a sequence of visually coherent images aligning with a given narrative and reference images, has seen significant progress with recent advancements in generative models. To further enhance the performance of story visualization frameworks in real-world scenarios, we introduce a comprehensive evaluation benchmark, ViStoryBench. We collect a diverse dataset encompassing various story types and artistic styles, ensuring models are evaluated across multiple dimensions such as different plots (e.g., comedy, horror) and visual aesthetics (e.g., anime, 3D renderings). ViStoryBench is carefully curated to balance narrative structures and visual elements, featuring stories with single and multiple protagonists to test models' ability to maintain character consistency. Additionally, it includes complex plots and intricate world-building to challenge models in generating accurate visuals. To ensure comprehensive comparisons, our benchmark incorporates a wide range of evaluation metrics assessing critical aspects. This structured and multifaceted framework enables researchers to thoroughly identify both the strengths and weaknesses of different models, fostering targeted improvements.
Problem

Research questions and friction points this paper is trying to address.

Evaluating story visualization models across diverse narratives and styles
Testing character consistency in single and multi-protagonist stories
Assessing model performance on complex plots and world-building
Innovation

Methods, ideas, or system contributions that make the work stand out.

Diverse dataset with various story types
Balanced narrative and visual elements
Wide range of evaluation metrics
๐Ÿ”Ž Similar Papers
No similar papers found.
Cailin Zhuang
Cailin Zhuang
ShanghaiTech University, AIGC Research
Generative AIComputer VisionAIGCArtCreativity
A
Ailin Huang
StepFun
W
Wei Cheng
StepFun
J
Jingwei Wu
StepFun
Y
Yaoqi Hu
AIGC Research
J
Jiaqi Liao
StepFun, AGI Lab, Westlake University
H
Hongyuan Wang
StepFun
Xinyao Liao
Xinyao Liao
Huazhong University of Science and Technology
W
Weiwei Cai
StepFun
Hengyuan Xu
Hengyuan Xu
Fudan University
Trustworthy AI
Xuanyang Zhang
Xuanyang Zhang
StepFun AI Researcher
Neural Architecture DesignAIGC3D GenerationMulti-modal
X
Xianfang Zeng
StepFun
Z
Zhewei Huang
StepFun
G
Gang Yu
StepFun
C
Chi Zhang
AGI Lab, Westlake University