Rethinking Human Preference Evaluation of LLM Rationales

📅 2025-09-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current LLM explanation evaluation relies heavily on binary preference judgments, lacking transparency and fine-grained attribution. To address this, we propose an attribute-based fine-grained evaluation framework: first, we identify key explainability attributes of high-quality reasoning—e.g., logical coherence and factual accuracy; second, we integrate automated metrics, LLM-based judgment, and human annotations into a multi-source scoring system; third, we employ SHAP analysis to quantify each attribute’s contribution to human preferences and design attribute-specific ELO scoring for interpretable model comparison. Experiments on MT-Bench and Chatbot Arena demonstrate that our framework significantly improves evaluation transparency and reliability. Attribute scores exhibit strong explanatory power for human preferences (R² > 0.85), enabling precise, attributable model diagnosis and ranking.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) often generate natural language rationales -- free-form explanations that help improve performance on complex reasoning tasks and enhance interpretability for human users. However, evaluating these rationales remains challenging. While recent work has relied on binary preference judgments from humans or LLM judges, such evaluations are often opaque and coarse-grained, offering limited insight into what makes one rationale better than another. In this work, we rethink preference evaluation for LLM-generated rationales by asking: (1) What attributes define good rationales? (2) Can human preferences be explained by these attributes? (3) Can attribute-based evaluation overcome the limitations of binary comparisons? We identify a set of key rationale attributes from prior literature and assess them using automatic metrics, LLM judgments, and human annotations. We then analyze two standard human preference datasets MT Bench and Chatbot Arena using SHAP to identify which attributes best explain human preference outcomes. Finally, we re-evaluate model-generated rationales using attribute-specific ELO scores, revealing more nuanced model comparisons and insights. Our findings suggest that fine-grained attribute evaluations can better characterize rationale quality and guide future research toward more interpretable and reliable evaluation practices.
Problem

Research questions and friction points this paper is trying to address.

Identifying key attributes defining high-quality LLM rationales
Analyzing how rationale attributes explain human preference outcomes
Developing fine-grained evaluation methods for rationale comparisons
Innovation

Methods, ideas, or system contributions that make the work stand out.

Attribute-based evaluation using SHAP analysis
Fine-grained ELO scores for rationale quality
Multi-method assessment with automatic and human metrics
🔎 Similar Papers
No similar papers found.