Evaluating the effectiveness of XAI techniques for encoder-based language models

📅 2025-01-01

🏛️ Knowledge-Based Systems

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

To address the lack of interpretability and trustworthiness stemming from the “black-box” nature of large language models (LLMs), this paper systematically evaluates eXplainable AI (XAI) methods on encoder-based models—including BERT and RoBERTa. We propose the first multi-dimensional XAI evaluation framework tailored specifically for encoder architectures, introducing two novel metrics: counterfactual robustness and cognitive load quantification. We comparatively assess prominent techniques—including Integrated Gradients, Layer-wise Relevance Propagation, Attention Rollout, and a newly proposed causal masking analysis. Our empirical analysis reveals a pervasive over-attribution bias in attention-based visualizations toward salient tokens, whereas gradient-based methods achieve up to a 37% improvement in explanation fidelity across downstream tasks. To foster reproducible research, we open-source the Encoder-XAI benchmark suite—a standardized evaluation toolkit with comprehensive empirical baselines—thereby establishing a foundation for rigorous, comparable, and theory-informed interpretability studies of encoder models.

Technology Category

Application Category

Problem

Research questions and friction points this paper is trying to address.

XAI methods

large language models

interpretability

Innovation

Methods, ideas, or system contributions that make the work stand out.

XAI Evaluation Framework

Human Understandability

Stability and Consistency

🔎 Similar Papers

No similar papers found.

Authors to Follow