Aether Weaver: Multimodal Affective Narrative Co-Generation with Dynamic Scene Graphs

📅 2025-07-29

📈 Citations: 0

✨ Influential: 0

career value

174K/year

🤖 AI Summary

Existing approaches predominantly adopt sequential text-to-visual generation, failing to simultaneously produce coherent textual narratives, dynamic scene graphs, visual imagery, and affective soundscapes; moreover, they suffer from insufficient cross-modal consistency in spatiotemporal structure, semantic relations, and emotional expression. This paper proposes a multimodal narrative co-generation framework: leveraging a large language model as the narrative engine, it integrates a dynamic scene graph management mechanism and a multimodal affective consistency control framework. A tripartite collaboration among a *narrator* module (text generation), a *director* module (scene graph and image synthesis), and an *affective controller* enables real-time, joint evolution of four modalities—text, scene graph, image, and soundscape—with tight spatiotemporal and affective alignment. Experiments demonstrate significant improvements over cascaded baselines in narrative depth, visual fidelity, and emotional resonance, enabling efficient creative prototyping and immersive storytelling across diverse genres.

Technology Category

Application Category

📝 Abstract

We introduce Aether Weaver, a novel, integrated framework for multimodal narrative co-generation that overcomes limitations of sequential text-to-visual pipelines. Our system concurrently synthesizes textual narratives, dynamic scene graph representations, visual scenes, and affective soundscapes, driven by a tightly integrated, co-generation mechanism. At its core, the Narrator, a large language model, generates narrative text and multimodal prompts, while the Director acts as a dynamic scene graph manager, and analyzes the text to build and maintain a structured representation of the story's world, ensuring spatio-temporal and relational consistency for visual rendering and subsequent narrative generation. Additionally, a Narrative Arc Controller guides the high-level story structure, influencing multimodal affective consistency, further complemented by an Affective Tone Mapper that ensures congruent emotional expression across all modalities. Through qualitative evaluations on a diverse set of narrative prompts encompassing various genres, we demonstrate that Aether Weaver significantly enhances narrative depth, visual fidelity, and emotional resonance compared to cascaded baseline approaches. This integrated framework provides a robust platform for rapid creative prototyping and immersive storytelling experiences.

Problem

Research questions and friction points this paper is trying to address.

Overcoming limitations of sequential text-to-visual pipelines

Ensuring spatio-temporal and relational consistency in narratives

Enhancing narrative depth, visual fidelity, and emotional resonance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrated co-generation mechanism for multimodal narratives

Dynamic scene graph manager ensures spatio-temporal consistency

Affective Tone Mapper maintains emotional congruence across modalities

🔎 Similar Papers

The Art of Storytelling: Multi-Agent Generative AI for Dynamic Multimodal Narratives