Self-Preference Bias in LLM-as-a-Judge

📅 2024-10-29

🏛️ arXiv.org

📈 Citations: 25

✨ Influential: 3

career value

203K/year

🤖 AI Summary

Large language models (LLMs) used as evaluators (“LLM-as-a-Judge”) exhibit a systematic “self-preference bias”: they prefer texts with lower perplexity—i.e., those better aligned with their own generative distribution—over objectively higher-quality outputs. Method: We propose the first quantifiable metric for self-preference bias, validated via perplexity analysis, statistical comparison between LLM and human judgments, and cross-model preference-scoring experiments. Contribution/Results: We demonstrate that GPT-4 exhibits significant self-preference bias, with its scores strongly negatively correlated with text perplexity—a correlation markedly stronger than that observed in human evaluations. Crucially, we identify the root cause as distributional alignment (i.e., matching the model’s internal generation distribution), not mere “self-identification.” This work establishes the first interpretable, reproducible, and quantifiable benchmark for diagnosing the reliability of automated LLM-based evaluation.

Technology Category

Application Category

📝 Abstract

Automated evaluation leveraging large language models (LLMs), commonly referred to as LLM evaluators or LLM-as-a-judge, has been widely used in measuring the performance of dialogue systems. However, the self-preference bias in LLMs has posed significant risks, including promoting specific styles or policies intrinsic to the LLMs. Despite the importance of this issue, there is a lack of established methods to measure the self-preference bias quantitatively, and its underlying causes are poorly understood. In this paper, we introduce a novel quantitative metric to measure the self-preference bias. Our experimental results demonstrate that GPT-4 exhibits a significant degree of self-preference bias. To explore the causes, we hypothesize that LLMs may favor outputs that are more familiar to them, as indicated by lower perplexity. We analyze the relationship between LLM evaluations and the perplexities of outputs. Our findings reveal that LLMs assign significantly higher evaluations to outputs with lower perplexity than human evaluators, regardless of whether the outputs were self-generated. This suggests that the essence of the bias lies in perplexity and that the self-preference bias exists because LLMs prefer texts more familiar to them.

Problem

Research questions and friction points this paper is trying to address.

Measuring self-preference bias in LLM evaluators quantitatively

Exploring causes of bias via perplexity and familiarity

Comparing LLM and human evaluator preferences for outputs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces metric for self-preference bias

Links bias to lower perplexity outputs

Compares LLM and human evaluator preferences

🔎 Similar Papers

No similar papers found.