LargeSHS: A large-scale dataset of music adaptation

📅 2025-11-19

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Current AI-based music generation research predominantly focuses on text-conditioned modeling, while reference-based song adaptation—such as cover generation—remains severely constrained by the lack of large-scale, structured datasets. Method: We introduce LargeSHS, the largest publicly available music adaptation dataset to date, comprising 1.7 million metadata entries and approximately 900,000 accessible audio links, systematically curated and enriched from the SecondHandSongs platform. We further propose the first structured modeling framework for musical adaptation relationships, enabling adaptation tree construction, performance clustering, and cover family identification to support cross-work semantic mapping. Contribution/Results: LargeSHS bridges critical data gaps in reference-guided music generation and adaptive Music Information Retrieval (MIR), establishing a foundational resource and novel research paradigm for cover generation, style transfer, and music evolution analysis.

Technology Category

Application Category

📝 Abstract

Recent advances in AI-based music generation have focused heavily on text-conditioned models, with less attention given to reference-based generation such as song adaptation. To support this line of research, we introduce LargeSHS, a large-scale dataset derived from SecondHandSongs, containing over 1.7 million metadata entries and approximately 900k publicly accessible audio links. Unlike existing datasets, LargeSHS includes structured adaptation relationships between musical works, enabling the construction of adaptation trees and performance clusters that represent cover song families. We provide comprehensive statistics and comparisons with existing datasets, highlighting the unique scale and richness of LargeSHS. This dataset paves the way for new research in cover song generation, reference-based music generation, and adaptation-aware MIR tasks.

Problem

Research questions and friction points this paper is trying to address.

Addresses limited research on reference-based music generation models

Provides structured adaptation relationships between musical works

Enables research in cover song generation and adaptation-aware MIR

Innovation

Methods, ideas, or system contributions that make the work stand out.

LargeSHS dataset with structured adaptation relationships

Contains 1.7 million metadata entries and audio links

Enables cover song generation and adaptation research

🔎 Similar Papers

No similar papers found.

Authors to Follow