A Necessary Step toward Faithfulness: Measuring and Improving Consistency in Free-Text Explanations

📅 2025-05-25

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Large language models (LLMs) frequently generate free-text explanations for high-stakes AI decisions that lack prediction-explanation consistency (PEX consistency), with over 62% of samples exhibiting inconsistency. Method: We propose the first learnable and evaluable PEX consistency metric, pioneering the integration of evidential weighting into textual explanation consistency modeling. Our approach jointly leverages evidential-weighted consistency quantification, direct preference optimization (DPO), and cross-model-family alignment training across LLaMA, Phi, and Qwen architectures. Contribution/Results: Experiments demonstrate a 43.1%–292.3% improvement in PEX consistency and up to a 9.7% gain in explanation faithfulness. This work establishes the first theoretically grounded and engineering-practical PEX consistency benchmark for explainable AI.

Technology Category

Application Category

📝 Abstract

Faithful free-text explanations are important to ensure transparency in high-stakes AI decision-making contexts, but they are challenging to generate by language models and assess by humans. In this paper, we present a measure for Prediction-EXplanation (PEX) consistency, by extending the concept of weight of evidence. This measure quantifies how much a free-text explanation supports or opposes a prediction, serving as an important aspect of explanation faithfulness. Our analysis reveals that more than 62% explanations generated by large language models lack this consistency. We show that applying direct preference optimization improves the consistency of generated explanations across three model families, with improvement ranging from 43.1% to 292.3%. Furthermore, we demonstrate that optimizing this consistency measure can improve explanation faithfulness by up to 9.7%.

Problem

Research questions and friction points this paper is trying to address.

Measuring consistency in free-text AI explanations

Improving explanation faithfulness via direct optimization

Addressing low consistency in large language model outputs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Measure PEX consistency via weight of evidence

Apply direct preference optimization for improvement

Optimize consistency to enhance explanation faithfulness

🔎 Similar Papers

No similar papers found.

Authors to Follow