Evaluating the effectiveness of XAI techniques for encoder-based language models

📅 2025-01-01
🏛️ Knowledge-Based Systems
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the lack of interpretability and trustworthiness stemming from the “black-box” nature of large language models (LLMs), this paper systematically evaluates eXplainable AI (XAI) methods on encoder-based models—including BERT and RoBERTa. We propose the first multi-dimensional XAI evaluation framework tailored specifically for encoder architectures, introducing two novel metrics: counterfactual robustness and cognitive load quantification. We comparatively assess prominent techniques—including Integrated Gradients, Layer-wise Relevance Propagation, Attention Rollout, and a newly proposed causal masking analysis. Our empirical analysis reveals a pervasive over-attribution bias in attention-based visualizations toward salient tokens, whereas gradient-based methods achieve up to a 37% improvement in explanation fidelity across downstream tasks. To foster reproducible research, we open-source the Encoder-XAI benchmark suite—a standardized evaluation toolkit with comprehensive empirical baselines—thereby establishing a foundation for rigorous, comparable, and theory-informed interpretability studies of encoder models.

Technology Category

Application Category

Problem

Research questions and friction points this paper is trying to address.

XAI methods
large language models
interpretability
Innovation

Methods, ideas, or system contributions that make the work stand out.

XAI Evaluation Framework
Human Understandability
Stability and Consistency
🔎 Similar Papers
No similar papers found.