On Negative-aware Preference Optimization for Recommendation

📅 2025-08-13

📈 Citations: 0

✨ Influential: 0

career value

160K/year

🤖 AI Summary

Existing LLM-based recommender systems suffer from inefficient negative sample utilization: naïvely aggregating negative samples improves ranking accuracy and mitigates popularity bias but incurs substantial computational and memory overhead, while ignoring inter-sample differences in informativeness—limiting optimization efficacy. This paper proposes an efficient preference optimization framework featuring two core innovations: (1) intra-batch negative sample sharing, enabling scalable expansion of negative sample size without proportional cost increase; and (2) dynamic reward margin adjustment, which differentiates sample informativeness to guide more effective learning. The method unifies preference optimization, contrastive learning, and dynamic-margin reinforcement learning. Evaluated on three public benchmarks, it significantly outperforms state-of-the-art approaches—achieving higher recommendation accuracy while more effectively suppressing popularity bias.

Technology Category

Application Category

📝 Abstract

Recommendation systems leverage user interaction data to suggest relevant items while filtering out irrelevant (negative) ones. The rise of large language models (LLMs) has garnered increasing attention for their potential in recommendation tasks. However, existing methods for optimizing LLM-based recommenders face challenges in effectively utilizing negative samples. Simply integrating large numbers of negative samples can improve ranking accuracy and mitigate popularity bias but often leads to increased computational overhead and memory costs. Additionally, current approaches fail to account for the varying informativeness of negative samples, leading to suboptimal optimization performance. To address these issues, we propose NAPO ( extbf{N}egative- extbf{A}ware extbf{P}reference extbf{O}ptimization), an enhanced framework for preference optimization in LLM-based recommendation. NAPO introduces two key innovations: (1) in-batch negative sharing, which expands the pool of negative samples without additional memory overhead, and (2) dynamic reward margin adjustment, which adapts model updates based on the confidence of negative samples. Extensive experiments on three public datasets demonstrate that NAPO outperforms existing methods in both recommendation accuracy and popularity bias reduction.

Problem

Research questions and friction points this paper is trying to address.

Optimizing LLM-based recommenders using negative samples effectively

Reducing computational overhead from integrating negative samples

Addressing varying informativeness of negative samples in optimization

Innovation

Methods, ideas, or system contributions that make the work stand out.

In-batch negative sharing reduces memory overhead

Dynamic reward margin adjusts sample confidence

NAPO enhances LLM-based recommendation accuracy

🔎 Similar Papers

Negative Sampling in Recommendation: A Survey and Future Directions