Story2MIDI: Emotionally Aligned Music Generation from Text

📅 2025-12-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the problem of generating emotion-consistent music from text. We propose a cross-modal sequence-to-sequence Transformer framework, whose core innovations are: (1) constructing the first small-scale paired dataset integrating fine-grained textual emotion annotations with corresponding musical affective representations (e.g., valence and arousal), and (2) designing an emotion-aligned attention mechanism that explicitly models semantic–affective mappings between text and music. The model is rigorously evaluated via objective musical metrics—including tonal coherence and rhythmic stability—as well as double-blind human listening experiments. Results demonstrate statistically significant improvements in both emotional alignment (p < 0.01) and perceptual naturalness, while preserving narrative coherence. This work establishes a novel, interpretable, and empirically evaluable paradigm for emotion-driven AI composition.

Technology Category

Application Category

📝 Abstract
In this paper, we introduce Story2MIDI, a sequence-to-sequence Transformer-based model for generating emotion-aligned music from a given piece of text. To develop this model, we construct the Story2MIDI dataset by merging existing datasets for sentiment analysis from text and emotion classification in music. The resulting dataset contains pairs of text blurbs and music pieces that evoke the same emotions in the reader or listener. Despite the small scale of our dataset and limited computational resources, our results indicate that our model effectively learns emotion-relevant features in music and incorporates them into its generation process, producing samples with diverse emotional responses. We evaluate the generated outputs using objective musical metrics and a human listening study, confirming the model's ability to capture intended emotional cues.
Problem

Research questions and friction points this paper is trying to address.

Generates emotion-aligned music from text
Constructs dataset pairing text and music by emotion
Evaluates model using musical metrics and human study
Innovation

Methods, ideas, or system contributions that make the work stand out.

Transformer model generates emotion-aligned music from text
Dataset merges text sentiment and music emotion classification
Evaluation uses objective metrics and human listening study
🔎 Similar Papers
No similar papers found.