RecBase: Generative Foundation Model Pretraining for Zero-Shot Recommendation

📅 2025-09-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing LLM-based recommender systems exhibit weak cross-domain generalization, primarily due to the misalignment between language pretraining objectives and recommendation-specific requirements—particularly the inability to model dynamic, item-level user interests. To address this, we propose RecLLM, a recommendation-oriented generative foundation model. RecLLM introduces a unified item tokenizer and a hierarchical concept encoder, coupled with a recommendation-aware autoregressive pretraining objective, enabling both cross-domain semantic alignment and dynamic interest modeling. Leveraging large-scale heterogeneous cross-domain corpora and a feature-mapping mechanism, it effectively captures complex item-sequence patterns. Evaluated on eight real-world datasets, the 1.5B-parameter RecLLM achieves zero-shot cross-domain recommendation performance competitive with—and in some cases superior to—that of 7B-parameter general-purpose LLMs.

Technology Category

Application Category

📝 Abstract
Recent advances in LLM-based recommendation have shown promise, yet their cross-domain generalization is hindered by a fundamental mismatch between language-centric pretraining and the recommendation task. Existing methods, relying on language-level knowledge, fail to capture dynamic, item-level user interests across domains. To bridge this gap, we propose RecBase, a domain-agnostic foundational model pretrained with a recommendation-oriented objective. RecBase leverages a large-scale, heterogeneous, cross-domain corpus with unified textual representations and feature mappings to enhance cross-domain generalization. To further align item semantics across domains, we introduce a unified item tokenizer that encodes items into hierarchical concept identifiers, enabling structured representation and efficient vocabulary sharing. The model is trained using an autoregressive objective to capture complex item-level sequential patterns. On eight real-world datasets, our 1.5B-parameter model matches or surpasses the performance of LLM baselines up to 7B parameters in zero-shot and cross-domain recommendation tasks.
Problem

Research questions and friction points this paper is trying to address.

Addresses cross-domain generalization in LLM-based recommendation systems
Captures dynamic item-level user interests across diverse domains
Aligns item semantics for structured representation and vocabulary sharing
Innovation

Methods, ideas, or system contributions that make the work stand out.

Domain-agnostic foundation model with recommendation-oriented pretraining objective
Unified item tokenizer encoding items into hierarchical concept identifiers
Autoregressive training capturing complex item-level sequential patterns
🔎 Similar Papers
No similar papers found.