🤖 AI Summary
Existing LLM assistants struggle to adapt to heterogeneous user populations. Method: This paper proposes PLUS, the first framework to condition reward modeling on user preferences, demographic/behavioral features, and textual summaries of historical dialogues—enabling interpretable, transferable, and zero-shot personalized RLHF. PLUS trains a user-summarization model via reinforcement learning and integrates summary-based reward modeling with an online co-adaptation loop, jointly optimizing multi-user distribution modeling and cross-topic generalization. Contribution/Results: Evaluated across multiple user-centric datasets, PLUS demonstrates strong robustness to unseen users and novel topics. Critically, its generated user summaries enable zero-shot transfer to powerful foundation models (e.g., GPT-4), significantly improving response personalization quality and user controllability without task-specific fine-tuning.
📝 Abstract
As everyday use cases of large language model (LLM) AI assistants have expanded, it is becoming increasingly important to personalize responses to align to different users' preferences and goals. While reinforcement learning from human feedback (RLHF) is effective at improving LLMs to be generally more helpful and fluent, it does not account for variability across users, as it models the entire user population with a single reward model. We present a novel framework, Preference Learning Using Summarization (PLUS), that learns text-based summaries of each user's preferences, characteristics, and past conversations. These summaries condition the reward model, enabling it to make personalized predictions about the types of responses valued by each user. We train the user-summarization model with reinforcement learning, and update the reward model simultaneously, creating an online co-adaptation loop. We show that in contrast with prior personalized RLHF techniques or with in-context learning of user information, summaries produced by PLUS capture meaningful aspects of a user's preferences. Across different pluralistic user datasets, we show that our method is robust to new users and diverse conversation topics. Additionally, we demonstrate that the textual summaries generated about users can be transferred for zero-shot personalization of stronger, proprietary models like GPT-4. The resulting user summaries are not only concise and portable, they are easy for users to interpret and modify, allowing for more transparency and user control in LLM alignment.