Semantic IDs for Music Recommendation

📅 2025-07-24

📈 Citations: 0

✨ Influential: 0

career value

187K/year

🤖 AI Summary

To address the inefficiency and poor generalization of one-hot ID embeddings in music recommendation—characterized by excessive parameter count and limited semantic expressiveness—this paper proposes Semantic ID, a content-aware shared embedding framework. Semantic ID learns low-dimensional, semantically meaningful representations from item content features via end-to-end joint optimization of a content encoder and the recommendation model, thereby drastically reducing the number of independent ID embeddings. Evaluated on two public music datasets, it outperforms baseline methods across key metrics including Recall@10 and Intra-List Similarity (ILS), while reducing model parameters by 37%–52%. Moreover, it improves both recommendation accuracy and diversity. Online A/B testing in a real-world streaming service confirms its operational effectiveness and delivers measurable business gains.

Technology Category

Application Category

📝 Abstract

Training recommender systems for next-item recommendation often requires unique embeddings to be learned for each item, which may take up most of the trainable parameters for a model. Shared embeddings, such as using content information, can reduce the number of distinct embeddings to be stored in memory. This allows for a more lightweight model; correspondingly, model complexity can be increased due to having fewer embeddings to store in memory. We show the benefit of using shared content-based features ('semantic IDs') in improving recommendation accuracy and diversity, while reducing model size, for two music recommendation datasets, including an online A/B test on a music streaming service.

Problem

Research questions and friction points this paper is trying to address.

Reducing memory usage with shared embeddings

Improving music recommendation accuracy and diversity

Testing semantic IDs in streaming service A/B

Innovation

Methods, ideas, or system contributions that make the work stand out.

Shared content-based semantic IDs reduce embeddings

Lightweight model with increased complexity potential

Improves accuracy and diversity in recommendations

🔎 Similar Papers

No similar papers found.