🤖 AI Summary
To address the lack of interpretability and trustworthiness stemming from the “black-box” nature of large language models (LLMs), this paper systematically evaluates eXplainable AI (XAI) methods on encoder-based models—including BERT and RoBERTa. We propose the first multi-dimensional XAI evaluation framework tailored specifically for encoder architectures, introducing two novel metrics: counterfactual robustness and cognitive load quantification. We comparatively assess prominent techniques—including Integrated Gradients, Layer-wise Relevance Propagation, Attention Rollout, and a newly proposed causal masking analysis. Our empirical analysis reveals a pervasive over-attribution bias in attention-based visualizations toward salient tokens, whereas gradient-based methods achieve up to a 37% improvement in explanation fidelity across downstream tasks. To foster reproducible research, we open-source the Encoder-XAI benchmark suite—a standardized evaluation toolkit with comprehensive empirical baselines—thereby establishing a foundation for rigorous, comparable, and theory-informed interpretability studies of encoder models.