Poetry in Pixels: Prompt Tuning for Poem Image Generation via Diffusion Models

📅 2025-01-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Addressing the challenge of accurately modeling deep poetic semantics—such as metaphor, emotion, and imagery—within diffusion models, this paper proposes PoemToPixel, a framework for cross-modal alignment from poetry to visual generation. Methodologically, it introduces (1) PoeKey, a novel algorithm that automatically extracts tripartite poetic instructions—emotion, visual elements, and theme; (2) MiniPo, the first multimodal dataset of children’s poetry (1,001 poems), filling a critical gap in the field; and (3) a diffusion-based, multi-stage prompt tuning strategy that jointly integrates poetic parsing and keyword-guided conditioning. Evaluated jointly on MiniPo and PoemSum, PoemToPixel achieves state-of-the-art performance in thematic consistency, emotional fidelity, and visual poetics, with both quantitative metrics and qualitative assessments significantly outperforming existing baselines.

Technology Category

Application Category

📝 Abstract
The task of text-to-image generation has encountered significant challenges when applied to literary works, especially poetry. Poems are a distinct form of literature, with meanings that frequently transcend beyond the literal words. To address this shortcoming, we propose a PoemToPixel framework designed to generate images that visually represent the inherent meanings of poems. Our approach incorporates the concept of prompt tuning in our image generation framework to ensure that the resulting images closely align with the poetic content. In addition, we propose the PoeKey algorithm, which extracts three key elements in the form of emotions, visual elements, and themes from poems to form instructions which are subsequently provided to a diffusion model for generating corresponding images. Furthermore, to expand the diversity of the poetry dataset across different genres and ages, we introduce MiniPo, a novel multimodal dataset comprising 1001 children's poems and images. Leveraging this dataset alongside PoemSum, we conducted both quantitative and qualitative evaluations of image generation using our PoemToPixel framework. This paper demonstrates the effectiveness of our approach and offers a fresh perspective on generating images from literary sources.
Problem

Research questions and friction points this paper is trying to address.

Poetry Visualization
Deep Meaning
Emotional Expression
Innovation

Methods, ideas, or system contributions that make the work stand out.

PoemToPixel
PoeKey algorithm
MiniPo database
🔎 Similar Papers
No similar papers found.
Sofia Jamil
Sofia Jamil
PhD Research Scholar
Large Language ModelNatural Language ProcessingText to Image Generation Models
B
Bollampalli Areen Reddy
Department of Computer Science & Engineering, Indian Institute of Technology Patna, India
R
Raghvendra Kumar
Department of Computer Science & Engineering, Indian Institute of Technology Patna, India
S
Sriparna Saha
Department of Computer Science & Engineering, Indian Institute of Technology Patna, India
K
K. J. Joseph
Adobe Research
Koustava Goswami
Koustava Goswami
Research Scientist 2 @ Adobe Research
Natural Language ProcessingLanguage ModelMultimodal Learning