POPI: Personalizing LLMs via Optimized Natural Language Preference Inference

📅 2025-10-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) excel on standard benchmarks but struggle to accommodate users’ individual preferences regarding style, tone, and reasoning patterns. Existing alignment methods—such as RLHF and DPO—optimize only for population-level averages, while per-user fine-tuning incurs prohibitive computational costs, and context-based prompting suffers from high noise and low efficiency. To address this, we propose POPI, the first framework that employs learnable natural language summaries as compact, transparent, and transferable representations of user preferences, enabling plug-and-play personalization without fine-tuning. POPI jointly optimizes a preference inference model and a generation model via reinforcement learning, integrating DPO with natural language inference to distill raw user signals into high-quality personalized instructions. Evaluated across four benchmarks, POPI significantly improves personalized accuracy, reduces context overhead, and enables efficient adaptation to frozen LLMs.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) achieve strong benchmark performance, yet user experiences remain inconsistent due to diverse preferences in style, tone, and reasoning mode. Nevertheless, existing alignment techniques such as reinforcement learning from human feedback (RLHF) or Direct Preference Optimization (DPO) largely optimize toward population-level averages and overlook individual variation. Naive personalization strategies like per-user fine-tuning are computationally prohibitive, and in-context approaches that prepend raw user signals often suffer from inefficiency and noise. To address these challenges, we propose POPI, a general framework that introduces a preference inference model to distill heterogeneous user signals into concise natural language summaries. These summaries act as transparent, compact, and transferable personalization representations that condition a shared generation model to produce personalized responses. POPI jointly optimizes both preference inference and personalized generation under a unified objective using reinforcement learning, ensuring summaries maximally encode useful preference information. Extensive experiments across four personalization benchmarks demonstrate that POPI consistently improves personalization accuracy while reducing context overhead by a large margin. Moreover, optimized summaries seamlessly transfer to frozen off-the-shelf LLMs, enabling plug-and-play personalization without weight updates.
Problem

Research questions and friction points this paper is trying to address.

Personalizing LLMs for individual user preferences in style and tone
Overcoming inefficiency of per-user fine-tuning and noisy context signals
Creating transferable preference summaries to condition shared generation models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Distills user signals into natural language summaries
Optimizes preference inference and generation jointly
Enables plug-and-play personalization for frozen LLMs
🔎 Similar Papers
No similar papers found.
Yizhuo Chen
Yizhuo Chen
University of Illinois Urbana-Champaign
Machine LearningInternet of Things
X
Xin Liu
Amazon
R
Ruijie Wang
Amazon
Z
Zheng Li
Amazon
P
Pei Chen
Amazon
C
Changlong Yu
Amazon
P
Priyanka Nigam
Amazon
M
Meng Jiang
University of Notre Dame
Bing Yin
Bing Yin
Amazon.com
NLPInformation RetrievalDeep LearningKnowledge Graphs