🤖 AI Summary
This paper addresses the challenge of preference inference for cold-start items (e.g., newly released content on Netflix) in recommender systems. To overcome the limitations of traditional collaborative filtering—which relies heavily on dense user–item interaction data—we propose an end-to-end reasoning framework that synergistically integrates large language model (LLM)-derived semantic priors with sparse user behavioral signals. Our method introduces a novel multi-step reasoning strategy, jointly optimized via supervised fine-tuning (SFT) and reward-model-based reinforcement learning fine-tuning (RLHF), to explicitly model item semantics, user historical intent, and domain-specific constraints. Evaluated on a real-world Netflix cold-start dataset, our approach achieves significant gains in recommendation accuracy, outperforming production-grade ranking models by up to 8% on key metrics. This work establishes a scalable and interpretable paradigm for leveraging LLMs to tackle cold-start recommendation, advancing both practical deployment and principled understanding of semantic-augmented preference modeling.
📝 Abstract
Large Language Models (LLMs) have shown significant potential for improving recommendation systems through their inherent reasoning capabilities and extensive knowledge base. Yet, existing studies predominantly address warm-start scenarios with abundant user-item interaction data, leaving the more challenging cold-start scenarios, where sparse interactions hinder traditional collaborative filtering methods, underexplored. To address this limitation, we propose novel reasoning strategies designed for cold-start item recommendations within the Netflix domain. Our method utilizes the advanced reasoning capabilities of LLMs to effectively infer user preferences, particularly for newly introduced or rarely interacted items. We systematically evaluate supervised fine-tuning, reinforcement learning-based fine-tuning, and hybrid approaches that combine both methods to optimize recommendation performance. Extensive experiments on real-world data demonstrate significant improvements in both methodological efficacy and practical performance in cold-start recommendation contexts. Remarkably, our reasoning-based fine-tuned models outperform Netflix's production ranking model by up to 8% in certain cases.