🤖 AI Summary
This study addresses joint melody and bassline generation conditioned on chord progressions, aiming to improve theoretical correctness and stylistic coherence in pitch content, interval distributions, and chord-tone usage. We propose five Transformer-based chord-conditioned generation paradigms and conduct a systematic comparative analysis of their modeling capabilities. Crucially, we introduce the first “bass-first” two-stage generation strategy, wherein bassline generation is performed prior to melody synthesis and serves as a strong structural constraint. Leveraging music-theory-driven quantitative metrics—including chord-tone coverage and interval distribution deviation—we empirically demonstrate that chord conditioning significantly enhances generation quality. The bass-first model achieves superior stylistic fidelity, particularly in tonal logic and voice-leading plausibility, outperforming baselines by an average of 12.7%. This work establishes a novel paradigm for incorporating structured musical priors into generative models.
📝 Abstract
We evaluate five Transformer-based strategies for chord-conditioned melody and bass generation using a set of music theory-motivated metrics capturing pitch content, pitch interval size, and chord tone usage. The evaluated models include (1) no chord conditioning, (2) independent line chord-conditioned generation, (3) bass-first chord-conditioned generation, (4) melody-first chord-conditioned generation, and (5) chord-conditioned co-generation. We show that chord-conditioning improves the replication of stylistic pitch content and chord tone usage characteristics, particularly for the bass-first model.