RecBase: Generative Foundation Model Pretraining for Zero-Shot Recommendation

📅 2025-09-03

📈 Citations: 0

✨ Influential: 0

career value

172K/year

🤖 AI Summary

Existing LLM-based recommender systems exhibit weak cross-domain generalization, primarily due to the misalignment between language pretraining objectives and recommendation-specific requirements—particularly the inability to model dynamic, item-level user interests. To address this, we propose RecLLM, a recommendation-oriented generative foundation model. RecLLM introduces a unified item tokenizer and a hierarchical concept encoder, coupled with a recommendation-aware autoregressive pretraining objective, enabling both cross-domain semantic alignment and dynamic interest modeling. Leveraging large-scale heterogeneous cross-domain corpora and a feature-mapping mechanism, it effectively captures complex item-sequence patterns. Evaluated on eight real-world datasets, the 1.5B-parameter RecLLM achieves zero-shot cross-domain recommendation performance competitive with—and in some cases superior to—that of 7B-parameter general-purpose LLMs.

Technology Category

Application Category

📝 Abstract

Recent advances in LLM-based recommendation have shown promise, yet their cross-domain generalization is hindered by a fundamental mismatch between language-centric pretraining and the recommendation task. Existing methods, relying on language-level knowledge, fail to capture dynamic, item-level user interests across domains. To bridge this gap, we propose RecBase, a domain-agnostic foundational model pretrained with a recommendation-oriented objective. RecBase leverages a large-scale, heterogeneous, cross-domain corpus with unified textual representations and feature mappings to enhance cross-domain generalization. To further align item semantics across domains, we introduce a unified item tokenizer that encodes items into hierarchical concept identifiers, enabling structured representation and efficient vocabulary sharing. The model is trained using an autoregressive objective to capture complex item-level sequential patterns. On eight real-world datasets, our 1.5B-parameter model matches or surpasses the performance of LLM baselines up to 7B parameters in zero-shot and cross-domain recommendation tasks.

Problem

Research questions and friction points this paper is trying to address.

Addresses cross-domain generalization in LLM-based recommendation systems

Captures dynamic item-level user interests across diverse domains

Aligns item semantics for structured representation and vocabulary sharing

Innovation

Methods, ideas, or system contributions that make the work stand out.

Domain-agnostic foundation model with recommendation-oriented pretraining objective

Unified item tokenizer encoding items into hierarchical concept identifiers

Autoregressive training capturing complex item-level sequential patterns

🔎 Similar Papers

A Pre-trained Zero-shot Sequential Recommendation Framework via Popularity Dynamics