Expressive Music Data Processing and Generation

📅 2025-03-14
📈 Citations: 0
Influential: 0
📄 PDF

career value

271K/year
🤖 AI Summary
To address the lack of expressivity and structural coherence in AI-generated music, this paper proposes an auditory-perception-driven generative framework. First, Weber’s Law is introduced into music data preprocessing to faithfully model subtle performance variations. Second, a conditional dependency model over multidimensional musical parameters is constructed, enabling joint modeling of multiple output sequences via the probabilistic chain rule. Third, a stepwise conditional sampling strategy coupled with entropy-guided filtering is designed, integrating information-theoretic aesthetics to quantify listener pleasure and information gain. Experiments demonstrate significant improvements over baselines in expressivity fidelity and phrase-level coherence, as validated by MuseScore metrics, Earth Mover’s Distance (EMD), and subjective evaluations. Moreover, this work establishes the first interpretable information-aesthetic evaluation framework for AI music generation.

Technology Category

Application Category

📝 Abstract
Musical expressivity and coherence are indispensable in music composition and performance, while often neglected in modern AI generative models. In this work, we introduce a listening-based data-processing technique that captures the expressivity in musical performance. This technique derived from Weber's law reflects the human perceptual truth of listening and preserves musical subtlety and expressivity in the training input. To facilitate musical coherence, we model the output interdependencies among multiple arguments in the music data such as pitch, duration, velocity, etc. in the neural networks based on the probabilistic chain rule. In practice, we decompose the multi-output sequential model into single-output submodels and condition previously sampled outputs on the subsequent submodels to induce conditional distributions. Finally, to select eligible sequences from all generations, a tentative measure based on the output entropy was proposed. The entropy sequence is set as a criterion to select predictable and stable generations, which is further studied under the context of informational aesthetic measures to quantify musical pleasure and information gain along the music tendency.
Problem

Research questions and friction points this paper is trying to address.

Captures musical expressivity using listening-based data-processing.
Models output interdependencies for musical coherence in neural networks.
Proposes entropy-based measure to select predictable and stable music sequences.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Listening-based data-processing captures musical expressivity.
Probabilistic chain rule models output interdependencies in music.
Output entropy measures select predictable, stable music sequences.