Preference-Aware Rubric Learning for Personalized Evaluation

📅 2026-05-29
📈 Citations: 0
Influential: 0
📄 PDF

career value

185K/year
🤖 AI Summary
Existing methods for personalized alignment evaluation struggle to capture users’ subjective preferences over long-term interactions. This work proposes a novel “personalized evaluation as learning” paradigm that automatically induces high-fidelity, preference-aware scoring rules from users’ historical behavior. By integrating a contrastive reinforcement learning objective with a self-verification mechanism, the framework enables discriminative training of evaluators. The designed evaluation principles balance representativeness, user consistency, and discriminability, successfully yielding generalizable scoring rules in real-world personalized text generation tasks. These rules accurately identify user-aligned responses and effectively capture both stable stylistic traits and fine-grained evaluative patterns inherent in individual preferences.
📝 Abstract
As Large Language Models (LLMs) evolve from general-purpose assistants to user-centric agents, personalization has become central to aligning model behavior with individual preferences, making the evaluation of personalized alignment a critical bottleneck. Existing evaluation methods-ranging from automatic metrics to LLM-as-a-judge approaches-fail to capture subjective, user-specific preferences embedded in long-term interaction histories. We identify three essential principles for reliable and effective personalized evaluation: Representativeness, User-Consistency, and Discriminativeness. To address these principles, we introduce Personalized Evaluation as Learning, a paradigm that formulates personalized evaluation as a learning problem rather than a static judgment. Under this paradigm, we propose PARL (Preference-Aware Rubric Learning for Personalized Evaluation), a framework that learns to induce preference-aware evaluation rubrics directly from raw user histories and performs a self-validation mechanism to ensure consistency with the user's preferences. PARL integrates rubric induction with a discriminative reinforcement learning objective that contrasts user-authored responses against competitive personalized model outputs, enabling the learned rubrics to capture precise, user-specific decision boundaries. Experiments on real-world personalized text generation tasks show that PARL consistently induces high-fidelity rubrics that reliably identify user-aligned responses and generalize across users and tasks, while capturing stable stylistic preferences and fine-grained evaluative patterns. To ensure reproducibility, our code is available at https://github.com/SnowCharmQ/PARL.
Problem

Research questions and friction points this paper is trying to address.

personalized evaluation
user preferences
large language models
subjective assessment
interaction histories
Innovation

Methods, ideas, or system contributions that make the work stand out.

Preference-Aware Rubric Learning
Personalized Evaluation
Rubric Induction
Discriminative Reinforcement Learning
User-Centric Alignment