🤖 AI Summary
Current AI-based music generation research predominantly focuses on text-conditioned modeling, while reference-based song adaptation—such as cover generation—remains severely constrained by the lack of large-scale, structured datasets. Method: We introduce LargeSHS, the largest publicly available music adaptation dataset to date, comprising 1.7 million metadata entries and approximately 900,000 accessible audio links, systematically curated and enriched from the SecondHandSongs platform. We further propose the first structured modeling framework for musical adaptation relationships, enabling adaptation tree construction, performance clustering, and cover family identification to support cross-work semantic mapping. Contribution/Results: LargeSHS bridges critical data gaps in reference-guided music generation and adaptive Music Information Retrieval (MIR), establishing a foundational resource and novel research paradigm for cover generation, style transfer, and music evolution analysis.
📝 Abstract
Recent advances in AI-based music generation have focused heavily on text-conditioned models, with less attention given to reference-based generation such as song adaptation. To support this line of research, we introduce LargeSHS, a large-scale dataset derived from SecondHandSongs, containing over 1.7 million metadata entries and approximately 900k publicly accessible audio links. Unlike existing datasets, LargeSHS includes structured adaptation relationships between musical works, enabling the construction of adaptation trees and performance clusters that represent cover song families. We provide comprehensive statistics and comparisons with existing datasets, highlighting the unique scale and richness of LargeSHS. This dataset paves the way for new research in cover song generation, reference-based music generation, and adaptation-aware MIR tasks.