HF4Rec: Human-Like Feedback-Driven Optimization Framework for Explainable Recommendation

📅 2025-04-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the scarcity of human feedback and the difficulty of optimizing explanation quality in explainable recommendation, this paper proposes a dynamic interactive optimization framework leveraging large language models (LLMs) to simulate human feedback. Methodologically, it integrates offline policy learning with a replay buffer to enable stable and efficient training. The key contributions are threefold: (1) the first formulation of an LLM as a differentiable human feedback simulator; (2) a human-guided, customizable reward scoring mechanism that aligns explanations with user intent; and (3) Pareto multi-objective optimization to jointly enhance explanation diversity, faithfulness, and readability. Evaluated on four benchmark datasets, the framework achieves significant improvements—+12.7% ROUGE-L and +9.3% F1 in explanation quality, and +4.1% NDCG@10 in recommendation accuracy—while preserving both user personalization and model generalizability.

Technology Category

Application Category

📝 Abstract
Recent advancements in explainable recommendation have greatly bolstered user experience by elucidating the decision-making rationale. However, the existing methods actually fail to provide effective feedback signals for potentially better or worse generated explanations due to their reliance on traditional supervised learning paradigms in sparse interaction data. To address these issues, we propose a novel human-like feedback-driven optimization framework. This framework employs a dynamic interactive optimization mechanism for achieving human-centered explainable requirements without incurring high labor costs. Specifically, we propose to utilize large language models (LLMs) as human simulators to predict human-like feedback for guiding the learning process. To enable the LLMs to deeply understand the task essence and meet user's diverse personalized requirements, we introduce a human-induced customized reward scoring method, which helps stimulate the language understanding and logical reasoning capabilities of LLMs. Furthermore, considering the potential conflicts between different perspectives of explanation quality, we introduce a principled Pareto optimization that transforms the multi-perspective quality enhancement task into a multi-objective optimization problem for improving explanation performance. At last, to achieve efficient model training, we design an off-policy optimization pipeline. By incorporating a replay buffer and addressing the data distribution biases, we can effectively improve data utilization and enhance model generality. Extensive experiments on four datasets demonstrate the superiority of our approach.
Problem

Research questions and friction points this paper is trying to address.

Optimizing explainable recommendations with human-like feedback
Using LLMs to simulate human feedback for personalized requirements
Resolving multi-perspective conflicts via Pareto optimization
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLMs simulate human feedback for learning guidance
Human-induced reward scoring enhances LLM reasoning
Pareto optimization resolves multi-perspective quality conflicts
🔎 Similar Papers
No similar papers found.
Jiakai Tang
Jiakai Tang
Gaoling School of Artificial Intelligence, Renmin University of China, Beijing, China
Recommender SystemsMulti-Agent Systems
J
Jingsen Zhang
Gaoling School of Artificial Intelligence, Renmin University of China, Beijing, China
Zihang Tian
Zihang Tian
Doctor at Gaoling School of AI
LLM-Based Agent
X
Xueyang Feng
Gaoling School of Artificial Intelligence, Renmin University of China, Beijing, China
L
Lei Wang
Gaoling School of Artificial Intelligence, Renmin University of China, Beijing, China
X
Xu Chen
Gaoling School of Artificial Intelligence, Renmin University of China, Beijing, China