Can we trust LLM Self-Explanations for Entity Resolution?

📅 2026-05-31

📈 Citations: 0

✨ Influential: 0

career value

169K/year

🤖 AI Summary

This study systematically evaluates the reliability of self-explanations generated by large language models (LLMs) for entity resolution tasks and reveals pervasive instability and low faithfulness in their attributions at both attribute and token levels, as well as in counterfactual explanations. To address these limitations, the authors propose Uncerta, a hybrid explanation framework that leverages self-explanations as priors to guide efficient post-hoc interpretation—a novel integration not previously explored. The effectiveness of Uncerta is validated across three prominent LLMs and ten diverse datasets. Experimental results demonstrate that Uncerta achieves explanation quality comparable to purely post-hoc methods while reducing computational costs by nearly an order of magnitude.

📝 Abstract

Large Language Models (LLMs) have recently shown strong performance on Entity Resolution (ER). Additionally, akin to their prowess in providing accurate predictions, these models often generate self-explanations alongside their predictions through prompting. While such self-explanations are appealing due to their negligible computational cost, their actual reliability remains largely unexplored. In this paper, we present the first large-scale systematic evaluation of LLM self-explanations for ER, focusing on feature attribution and counterfactual explanations at both the attribute and token levels. Across three LLMs, ten datasets, and multiple prompting strategies, we show that self-explanations are often unstable, weakly faithful, and poorly aligned with counterfactual evidence, revealing a substantial gap between plausibility and causal relevance. We further demonstrate that established post-hoc explanation methods provide significantly higher trustworthiness, but at a prohibitive computational cost when applied to LLMs. To bridge this gap, we introduce \uncerta{}, a hybrid explanation framework that leverages self-explanations as priors to guide post-hoc exploration. \uncerta{} achieves explanation quality comparable to post-hoc methods while reducing cost by up to an order of magnitude.

Problem

Research questions and friction points this paper is trying to address.

Entity Resolution

LLM Self-Explanations

Feature Attribution

Counterfactual Explanations

Explanation Reliability

Innovation

Methods, ideas, or system contributions that make the work stand out.

self-explanations

entity resolution

post-hoc explanations