MTRec: Learning to Align with User Preferences via Mental Reward Models

📅 2025-09-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Implicit feedback (e.g., clicks) often misaligns with users’ true preferences, inducing recommendation bias. To address this, we propose MTRec—a novel framework that, for the first time, integrates psychological reward modeling with distributed inverse reinforcement learning (IRL) to infer users’ latent satisfaction from sparse implicit signals, thereby moving beyond surface-level behavioral proxies. Methodologically, MTRec constructs an interpretable psychological reward model to characterize deep user preferences and employs distributed IRL to efficiently estimate the underlying reward function. This learned reward is then embedded into a sequential recommendation architecture, enabling end-to-end preference-aligned optimization. Extensive experiments on multiple benchmark datasets demonstrate significant improvements in recommendation accuracy. Deployed in a large-scale short-video platform, MTRec increased average user watch time by 7%.

Technology Category

Application Category

📝 Abstract
Recommendation models are predominantly trained using implicit user feedback, since explicit feedback is often costly to obtain. However, implicit feedback, such as clicks, does not always reflect users' real preferences. For example, a user might click on a news article because of its attractive headline, but end up feeling uncomfortable after reading the content. In the absence of explicit feedback, such erroneous implicit signals may severely mislead recommender systems. In this paper, we propose MTRec, a novel sequential recommendation framework designed to align with real user preferences by uncovering their internal satisfaction on recommended items. Specifically, we introduce a mental reward model to quantify user satisfaction and propose a distributional inverse reinforcement learning approach to learn it. The learned mental reward model is then used to guide recommendation models to better align with users' real preferences. Our experiments show that MTRec brings significant improvements to a variety of recommendation models. We also deploy MTRec on an industrial short video platform and observe a 7 percent increase in average user viewing time.
Problem

Research questions and friction points this paper is trying to address.

Aligning recommendations with users' true preferences using implicit feedback
Quantifying user satisfaction through mental reward modeling
Correcting misleading signals from clicks to improve recommendation accuracy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Mental reward model quantifies user satisfaction
Distributional inverse reinforcement learning learns preferences
Learned mental rewards guide recommendation alignment
🔎 Similar Papers
No similar papers found.
Mengchen Zhao
Mengchen Zhao
South China University of Technology
Reinforcement LearningMulti-Agent SystemsGenerative Decision MakingLLM Agents
Y
Yifan Gao
School of Computer Science and Technology, Dalian University of Technology
Y
Yaqing Hou
School of Computer Science and Technology, Dalian University of Technology
X
Xiangyang Li
Huawei Noah’s Ark Lab
Pengjie Gu
Pengjie Gu
Nanyang Technological University
Reinforcement LearningAlignmentSpiking Neural Networks
Zhenhua Dong
Zhenhua Dong
Noah's ark lab, Huawei Technologies Co., Ltd.
Recommender systemcausal inferencecountrfactual learningtrustworthy AImachine learning
R
Ruiming Tang
Huawei Noah’s Ark Lab
Y
Yi Cai
School of Software Engineering, South China University of Technology