Exploring Diversity, Novelty, and Popularity Bias in ChatGPT's Recommendations

📅 2026-01-05
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the lack of systematic evaluation of large language models (LLMs) in recommendation systems beyond accuracy, particularly with respect to diversity, novelty, and popularity bias. It presents the first comprehensive assessment of ChatGPT-3.5 and ChatGPT-4 in Top-N recommendation and cold-start scenarios, leveraging three real-world datasets and integrating both traditional recommendation metrics and non-accuracy dimensions. The results demonstrate that ChatGPT-4 matches or even surpasses conventional recommender models in diversity and novelty, while significantly improving both accuracy and novelty in cold-start settings. Furthermore, the analysis reveals nuanced behaviors of ChatGPT-4 in either mitigating or exacerbating popularity bias, offering critical empirical insights into the potential and limitations of LLMs for recommendation tasks.

Technology Category

Application Category

📝 Abstract
ChatGPT has emerged as a versatile tool, demonstrating capabilities across diverse domains. Given these successes, the Recommender Systems (RSs) community has begun investigating its applications within recommendation scenarios primarily focusing on accuracy. While the integration of ChatGPT into RSs has garnered significant attention, a comprehensive analysis of its performance across various dimensions remains largely unexplored. Specifically, the capabilities of providing diverse and novel recommendations or exploring potential biases such as popularity bias have not been thoroughly examined. As the use of these models continues to expand, understanding these aspects is crucial for enhancing user satisfaction and achieving long-term personalization. This study investigates the recommendations provided by ChatGPT-3.5 and ChatGPT-4 by assessing ChatGPT's capabilities in terms of diversity, novelty, and popularity bias. We evaluate these models on three distinct datasets and assess their performance in Top-N recommendation and cold-start scenarios. The findings reveal that ChatGPT-4 matches or surpasses traditional recommenders, demonstrating the ability to balance novelty and diversity in recommendations. Furthermore, in the cold-start scenario, ChatGPT models exhibit superior performance in both accuracy and novelty, suggesting they can be particularly beneficial for new users. This research highlights the strengths and limitations of ChatGPT's recommendations, offering new perspectives on the capacity of these models to provide recommendations beyond accuracy-focused metrics.
Problem

Research questions and friction points this paper is trying to address.

diversity
novelty
popularity bias
recommendation systems
ChatGPT
Innovation

Methods, ideas, or system contributions that make the work stand out.

diversity
novelty
popularity bias
cold-start
large language models
🔎 Similar Papers
No similar papers found.
Dario Di Palma
Dario Di Palma
Ph.D. Student at Politecnico di Bari
Large Language ModelsRecommender SystemsInterpretabilityMulti-Objective Evaluation
G
Giovanni Maria Biancofiore
Politecnico di Bari, Italy
V
V. W. Anelli
Politecnico di Bari, Italy
F
F. Narducci
Politecnico di Bari, Italy
T
T. D. Noia
Politecnico di Bari, Italy