POPI: Personalizing LLMs via Optimized Natural Language Preference Inference

📅 2025-10-17

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Large language models (LLMs) excel on standard benchmarks but struggle to accommodate users’ individual preferences regarding style, tone, and reasoning patterns. Existing alignment methods—such as RLHF and DPO—optimize only for population-level averages, while per-user fine-tuning incurs prohibitive computational costs, and context-based prompting suffers from high noise and low efficiency. To address this, we propose POPI, the first framework that employs learnable natural language summaries as compact, transparent, and transferable representations of user preferences, enabling plug-and-play personalization without fine-tuning. POPI jointly optimizes a preference inference model and a generation model via reinforcement learning, integrating DPO with natural language inference to distill raw user signals into high-quality personalized instructions. Evaluated across four benchmarks, POPI significantly improves personalized accuracy, reduces context overhead, and enables efficient adaptation to frozen LLMs.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) achieve strong benchmark performance, yet user experiences remain inconsistent due to diverse preferences in style, tone, and reasoning mode. Nevertheless, existing alignment techniques such as reinforcement learning from human feedback (RLHF) or Direct Preference Optimization (DPO) largely optimize toward population-level averages and overlook individual variation. Naive personalization strategies like per-user fine-tuning are computationally prohibitive, and in-context approaches that prepend raw user signals often suffer from inefficiency and noise. To address these challenges, we propose POPI, a general framework that introduces a preference inference model to distill heterogeneous user signals into concise natural language summaries. These summaries act as transparent, compact, and transferable personalization representations that condition a shared generation model to produce personalized responses. POPI jointly optimizes both preference inference and personalized generation under a unified objective using reinforcement learning, ensuring summaries maximally encode useful preference information. Extensive experiments across four personalization benchmarks demonstrate that POPI consistently improves personalization accuracy while reducing context overhead by a large margin. Moreover, optimized summaries seamlessly transfer to frozen off-the-shelf LLMs, enabling plug-and-play personalization without weight updates.

Problem

Research questions and friction points this paper is trying to address.

Personalizing LLMs for individual user preferences in style and tone

Overcoming inefficiency of per-user fine-tuning and noisy context signals

Creating transferable preference summaries to condition shared generation models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Distills user signals into natural language summaries

Optimizes preference inference and generation jointly

Enables plug-and-play personalization for frozen LLMs

🔎 Similar Papers

No similar papers found.

Authors to Follow