🤖 AI Summary
Visual narrative understanding suffers from inconsistent and redundant symbolic graph representations: semantically similar events or actions are fragmented across disparate annotations, undermining reasoning robustness and generalization. To address this, we propose a hierarchical semantic normalization framework for knowledge graphs, enabling the first multi-granular structural modeling—spanning panel-level, event-level, and story-level—and cross-level symbolic alignment. Our method integrates lexical similarity analysis with embedding-driven clustering, augmented by cognition-inspired mechanisms to standardize action and event semantics. Evaluated on the Manga109 dataset, the resulting normalized graph significantly improves coherence and robustness across action retrieval, character localization, and event summarization tasks (+12.7% F1), while preserving interpretability and transparency of the symbolic representation system.
📝 Abstract
Understanding visual narratives such as comics requires structured representations that capture events, characters, and their relations across multiple levels of story organization. However, symbolic narrative graphs often suffer from inconsistency and redundancy, where similar actions or events are labeled differently across annotations or contexts. Such variance limits the effectiveness of reasoning and generalization.
This paper introduces a semantic normalization framework for hierarchical narrative knowledge graphs. Building on cognitively grounded models of narrative comprehension, we propose methods that consolidate semantically related actions and events using lexical similarity and embedding-based clustering. The normalization process reduces annotation noise, aligns symbolic categories across narrative levels, and preserves interpretability.
We demonstrate the framework on annotated manga stories from the Manga109 dataset, applying normalization to panel-, event-, and story-level graphs. Preliminary evaluations across narrative reasoning tasks, such as action retrieval, character grounding, and event summarization, show that semantic normalization improves coherence and robustness, while maintaining symbolic transparency. These findings suggest that normalization is a key step toward scalable, cognitively inspired graph models for multimodal narrative understanding.