ROLeR: Effective Reward Shaping in Offline Reinforcement Learning for Recommender Systems

📅 2024-07-18
🏛️ International Conference on Information and Knowledge Management
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the limitations in offline reinforcement learning (RL) recommendation systems—namely, inaccurate reward modeling and high model uncertainty that degrade policy performance—this paper proposes a novel framework integrating nonparametric reward shaping with adaptive uncertainty penalization. We introduce the first nonparametric reward shaping mechanism to mitigate distributional shift between offline datasets and online interactions. Additionally, we design a recommendation-specific, flexible uncertainty penalty term that jointly optimizes policy robustness and generalization. The method synergistically combines model-based RL, nonparametric regression, and confidence-interval-driven uncertainty quantification. Evaluated on four benchmark recommendation datasets, our approach achieves state-of-the-art (SOTA) performance: it improves average recall by 5.2% and NDCG@10 by 4.8% over existing offline RL recommendation methods, demonstrating significant gains in both effectiveness and reliability.

Technology Category

Application Category

📝 Abstract
Offline reinforcement learning (RL) is an effective tool for real-world recommender systems with its capacity to model the dynamic interest of users and its interactive nature. Most existing offline RL recommender systems focus on model-based RL through learning a world model from offline data and building the recommendation policy by interacting with this model. Although these methods have made progress in the recommendation performance, the effectiveness of model-based offline RL methods is often constrained by the accuracy of the estimation of the reward model and the model uncertainties, primarily due to the extreme discrepancy between offline logged data and real-world data in user interactions with online platforms. To fill this gap, a more accurate reward model and uncertainty estimation are needed for the model-based RL methods. In this paper, a novel model-based Reward Shaping in Offline Reinforcement Learning for Recommender Systems, ROLeR, is proposed for reward and uncertainty estimation in recommendation systems. Specifically, a non-parametric reward shaping method is designed to refine the reward model. In addition, a flexible and more representative uncertainty penalty is designed to fit the needs of recommendation systems. Extensive experiments conducted on four benchmark datasets showcase that ROLeR achieves state-of-the-art performance compared with existing baselines. The source code can be downloaded at https://github.com/ArronDZhang/ROLeR.
Problem

Research questions and friction points this paper is trying to address.

Improves reward model accuracy in offline RL for recommender systems
Addresses uncertainty estimation challenges in model-based RL methods
Reduces discrepancy between offline data and real-world user interactions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Non-parametric reward shaping for accurate rewards
Flexible uncertainty penalty for recommendation needs
Model-based RL with refined reward and uncertainty
🔎 Similar Papers
No similar papers found.
Y
Yi Zhang
The University of Queensland, CSIRO DATA61
Ruihong Qiu
Ruihong Qiu
ARC DECRA Fellow, Lecturer (Assistant Professor) @The University of Queensland
GraphLarge Language Models
J
Jiajun Liu
CSIRO DATA61, The University of Queensland
S
Sen Wang
The University of Queensland