Tapping the Potential of Large Language Models as Recommender Systems: A Comprehensive Framework and Empirical Analysis

๐Ÿ“… 2024-01-10
๐Ÿ“ˆ Citations: 22
โœจ Influential: 2
๐Ÿ“„ PDF
๐Ÿค– AI Summary
The effectiveness and applicability boundaries of large language models (LLMs) in recommendation tasks remain poorly understood. Method: We propose a unified prompt engineering framework that reformulates recommendation as natural language inference, enabling zero-shot and cross-scenario generalization. We conduct controlled, multi-dimensional experiments on MovieLens and Amazon datasets to isolate the independent effects of LLM architecture, parameter scale, context length, and four prompt componentsโ€”task description, user interest modeling, candidate item construction, and prompting strategy. Contribution/Results: Our study establishes a reproducible evaluation paradigm and demonstrates that LLMs possess intrinsic zero-shot recommendation capability. However, prompt quality and fidelity of user interest modeling constitute critical bottlenecks. Structurally optimizing prompts yields substantial performance gains. This work provides both an empirically grounded benchmark and a practical, deployable technical pathway for LLM-based recommender systems.

Technology Category

Application Category

๐Ÿ“ Abstract
Recently, Large Language Models~(LLMs) such as ChatGPT have showcased remarkable abilities in solving general tasks, demonstrating the potential for applications in recommender systems. To assess how effectively LLMs can be used in recommendation tasks, our study primarily focuses on employing LLMs as recommender systems through prompting engineering. We propose a general framework for utilizing LLMs in recommendation tasks, focusing on the capabilities of LLMs as recommenders. To conduct our analysis, we formalize the input of LLMs for recommendation into natural language prompts with two key aspects, and explain how our framework can be generalized to various recommendation scenarios. As for the use of LLMs as recommenders, we analyze the impact of public availability, tuning strategies, model architecture, parameter scale, and context length on recommendation results based on the classification of LLMs. As for prompt engineering, we further analyze the impact of four important components of prompts, ie task descriptions, user interest modeling, candidate items construction and prompting strategies. In each section, we first define and categorize concepts in line with the existing literature. Then, we propose inspiring research questions followed by detailed experiments on two public datasets, in order to systematically analyze the impact of different factors on performance. Based on our empirical analysis, we finally summarize promising directions to shed lights on future research.
Problem

Research questions and friction points this paper is trying to address.

Large Language Models
Recommendation Systems
Performance Evaluation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Large Language Models
Prompt Engineering
Recommendation Systems
๐Ÿ”Ž Similar Papers
No similar papers found.
L
Lanling Xu
Gaoling School of Artificial Intelligence, Renmin University of China, China
J
Junjie Zhang
Gaoling School of Artificial Intelligence, Renmin University of China, China
B
Bingqian Li
Gaoling School of Artificial Intelligence, Renmin University of China, China
J
Jinpeng Wang
Meituan Group, China
S
Sheng Chen
Meituan Group, China
Wayne Xin Zhao
Wayne Xin Zhao
Professor, Renmin University of China
Recommender SystemNatural Language ProcessingLarge Language Model
Ji-Rong Wen
Ji-Rong Wen
Gaoling School of Artificial Intelligence, Renmin University of China
Large Language ModelWeb SearchInformation RetrievalMachine Learning