🤖 AI Summary
Recommender systems face dual challenges of interaction data sparsity and popularity bias. To address these, we propose LLM-GCL: a novel framework that first leverages large language models (LLMs) for few-shot prompt-based candidate re-ranking; high-confidence synthetic interactions are then generated via majority voting, ensuring theoretical guarantees on both effectiveness and distributional consistency of data augmentation. Subsequently, the original and synthetic interactions are jointly encoded into a heterogeneous graph, where text description–driven graph contrastive learning (GCL) refines user-item representations. By synergistically mitigating data sparsity and long-tail popularity bias, LLM-GCL achieves significant improvements in Recall@K and NDCG across multiple benchmark datasets, while substantially reducing popularity bias—e.g., decreasing Item Popularity Gap by up to 23.6%—outperforming state-of-the-art baselines comprehensively.
📝 Abstract
Recommendation systems often suffer from data sparsity caused by limited user-item interactions, which degrade their performance and amplify popularity bias in real-world scenarios. This paper proposes a novel data augmentation framework that leverages Large Language Models (LLMs) and item textual descriptions to enrich interaction data. By few-shot prompting LLMs multiple times to rerank items and aggregating the results via majority voting, we generate high-confidence synthetic user-item interactions, supported by theoretical guarantees based on the concentration of measure. To effectively leverage the augmented data in the context of a graph recommendation system, we integrate it into a graph contrastive learning framework to mitigate distributional shift and alleviate popularity bias. Extensive experiments show that our method improves accuracy and reduces popularity bias, outperforming strong baselines.