HF4Rec: Human-Like Feedback-Driven Optimization Framework for Explainable Recommendation

📅 2025-04-19

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

To address the scarcity of human feedback and the difficulty of optimizing explanation quality in explainable recommendation, this paper proposes a dynamic interactive optimization framework leveraging large language models (LLMs) to simulate human feedback. Methodologically, it integrates offline policy learning with a replay buffer to enable stable and efficient training. The key contributions are threefold: (1) the first formulation of an LLM as a differentiable human feedback simulator; (2) a human-guided, customizable reward scoring mechanism that aligns explanations with user intent; and (3) Pareto multi-objective optimization to jointly enhance explanation diversity, faithfulness, and readability. Evaluated on four benchmark datasets, the framework achieves significant improvements—+12.7% ROUGE-L and +9.3% F1 in explanation quality, and +4.1% NDCG@10 in recommendation accuracy—while preserving both user personalization and model generalizability.

Technology Category

Application Category

📝 Abstract

Recent advancements in explainable recommendation have greatly bolstered user experience by elucidating the decision-making rationale. However, the existing methods actually fail to provide effective feedback signals for potentially better or worse generated explanations due to their reliance on traditional supervised learning paradigms in sparse interaction data. To address these issues, we propose a novel human-like feedback-driven optimization framework. This framework employs a dynamic interactive optimization mechanism for achieving human-centered explainable requirements without incurring high labor costs. Specifically, we propose to utilize large language models (LLMs) as human simulators to predict human-like feedback for guiding the learning process. To enable the LLMs to deeply understand the task essence and meet user's diverse personalized requirements, we introduce a human-induced customized reward scoring method, which helps stimulate the language understanding and logical reasoning capabilities of LLMs. Furthermore, considering the potential conflicts between different perspectives of explanation quality, we introduce a principled Pareto optimization that transforms the multi-perspective quality enhancement task into a multi-objective optimization problem for improving explanation performance. At last, to achieve efficient model training, we design an off-policy optimization pipeline. By incorporating a replay buffer and addressing the data distribution biases, we can effectively improve data utilization and enhance model generality. Extensive experiments on four datasets demonstrate the superiority of our approach.

Problem

Research questions and friction points this paper is trying to address.

Optimizing explainable recommendations with human-like feedback

Using LLMs to simulate human feedback for personalized requirements

Resolving multi-perspective conflicts via Pareto optimization

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLMs simulate human feedback for learning guidance

Human-induced reward scoring enhances LLM reasoning

Pareto optimization resolves multi-perspective quality conflicts

🔎 Similar Papers

No similar papers found.

Authors to Follow