Learning Pluralistic User Preferences through Reinforcement Learning Fine-tuned Summaries

📅 2025-07-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing LLM assistants struggle to adapt to heterogeneous user populations. Method: This paper proposes PLUS, the first framework to condition reward modeling on user preferences, demographic/behavioral features, and textual summaries of historical dialogues—enabling interpretable, transferable, and zero-shot personalized RLHF. PLUS trains a user-summarization model via reinforcement learning and integrates summary-based reward modeling with an online co-adaptation loop, jointly optimizing multi-user distribution modeling and cross-topic generalization. Contribution/Results: Evaluated across multiple user-centric datasets, PLUS demonstrates strong robustness to unseen users and novel topics. Critically, its generated user summaries enable zero-shot transfer to powerful foundation models (e.g., GPT-4), significantly improving response personalization quality and user controllability without task-specific fine-tuning.

Technology Category

Application Category

📝 Abstract
As everyday use cases of large language model (LLM) AI assistants have expanded, it is becoming increasingly important to personalize responses to align to different users' preferences and goals. While reinforcement learning from human feedback (RLHF) is effective at improving LLMs to be generally more helpful and fluent, it does not account for variability across users, as it models the entire user population with a single reward model. We present a novel framework, Preference Learning Using Summarization (PLUS), that learns text-based summaries of each user's preferences, characteristics, and past conversations. These summaries condition the reward model, enabling it to make personalized predictions about the types of responses valued by each user. We train the user-summarization model with reinforcement learning, and update the reward model simultaneously, creating an online co-adaptation loop. We show that in contrast with prior personalized RLHF techniques or with in-context learning of user information, summaries produced by PLUS capture meaningful aspects of a user's preferences. Across different pluralistic user datasets, we show that our method is robust to new users and diverse conversation topics. Additionally, we demonstrate that the textual summaries generated about users can be transferred for zero-shot personalization of stronger, proprietary models like GPT-4. The resulting user summaries are not only concise and portable, they are easy for users to interpret and modify, allowing for more transparency and user control in LLM alignment.
Problem

Research questions and friction points this paper is trying to address.

Personalizing LLM responses to diverse user preferences
Learning user-specific summaries for reward model adaptation
Enabling transparent and portable user preference summaries
Innovation

Methods, ideas, or system contributions that make the work stand out.

Reinforcement learning fine-tunes personalized user summaries
Summaries condition reward model for personalized predictions
Textual summaries enable zero-shot personalization of models