🤖 AI Summary
In generative recommender systems, the semantic ID (SID)-based paradigm suffers from rapid performance saturation during model scaling, revealing a representational bottleneck. This work systematically identifies the root cause: SID’s inherent limitations in modeling user–item collaborative signals. To address this, we propose the “LLM-as-Recommender” (LLM-as-RS) paradigm—leveraging off-the-shelf large language models (LLMs) with 44M–14B parameters, integrated with quantized modality encoding and autoregressive interaction sequence modeling for end-to-end recommendation generation. Extensive experiments demonstrate that LLM-as-RS exhibits superior scaling behavior, outperforming state-of-the-art SID methods by up to 20% across multiple benchmarks. Crucially, our results provide the first empirical evidence that LLMs can effectively capture collaborative filtering signals without explicit collaborative modeling. This establishes LLM-as-RS as a more scalable and principled architectural pathway for generative recommendation.
📝 Abstract
Recent advancements in generative models have allowed the emergence of a promising paradigm for recommender systems (RS), known as Generative Recommendation (GR), which tries to unify rich item semantics and collaborative filtering signals. One popular modern approach is to use semantic IDs (SIDs), which are discrete codes quantized from the embeddings of modality encoders (e.g., large language or vision models), to represent items in an autoregressive user interaction sequence modeling setup (henceforth, SID-based GR). While generative models in other domains exhibit well-established scaling laws, our work reveals that SID-based GR shows significant bottlenecks while scaling up the model. In particular, the performance of SID-based GR quickly saturates as we enlarge each component: the modality encoder, the quantization tokenizer, and the RS itself. In this work, we identify the limited capacity of SIDs to encode item semantic information as one of the fundamental bottlenecks. Motivated by this observation, as an initial effort to obtain GR models with better scaling behaviors, we revisit another GR paradigm that directly uses large language models (LLMs) as recommenders (henceforth, LLM-as-RS). Our experiments show that the LLM-as-RS paradigm has superior model scaling properties and achieves up to 20 percent improvement over the best achievable performance of SID-based GR through scaling. We also challenge the prevailing belief that LLMs struggle to capture collaborative filtering information, showing that their ability to model user-item interactions improves as LLMs scale up. Our analyses on both SID-based GR and LLMs across model sizes from 44M to 14B parameters underscore the intrinsic scaling limits of SID-based GR and position LLM-as-RS as a promising path toward foundation models for GR.