A Unified Framework with Novel Metrics for Evaluating the Effectiveness of XAI Techniques in LLMs

📅 2025-03-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the lack of a unified framework for evaluating the interpretability of large language models (LLMs). We propose a four-dimensional quantitative evaluation framework—covering transparency, robustness, consistency, and contrastivity—and introduce four novel metrics, including Human-reasoning Agreement. For the first time, we conduct a systematic, cross-model (e.g., LLaMA, ChatGLM) and cross-task (sentiment analysis, text classification) empirical comparison of five XAI methods—LIME, SHAP, Integrated Gradients, LRP, and attention visualization—on the IMDB and Tweet Sentiment datasets. Results show that LIME achieves the best overall performance; AMV excels in robustness and consistency; and LRP demonstrates the strongest contrastivity on complex models. This work establishes the first empirically grounded, model- and task-agnostic benchmark for XAI method selection, providing both practical guidance and theoretical foundations for interpretability assessment in LLMs.

Technology Category

Application Category

📝 Abstract
The increasing complexity of LLMs presents significant challenges to their transparency and interpretability, necessitating the use of eXplainable AI (XAI) techniques to enhance trustworthiness and usability. This study introduces a comprehensive evaluation framework with four novel metrics for assessing the effectiveness of five XAI techniques across five LLMs and two downstream tasks. We apply this framework to evaluate several XAI techniques LIME, SHAP, Integrated Gradients, Layer-wise Relevance Propagation (LRP), and Attention Mechanism Visualization (AMV) using the IMDB Movie Reviews and Tweet Sentiment Extraction datasets. The evaluation focuses on four key metrics: Human-reasoning Agreement (HA), Robustness, Consistency, and Contrastivity. Our results show that LIME consistently achieves high scores across multiple LLMs and evaluation metrics, while AMV demonstrates superior Robustness and near-perfect Consistency. LRP excels in Contrastivity, particularly with more complex models. Our findings provide valuable insights into the strengths and limitations of different XAI methods, offering guidance for developing and selecting appropriate XAI techniques for LLMs.
Problem

Research questions and friction points this paper is trying to address.

Evaluates XAI techniques for LLM transparency and interpretability.
Introduces novel metrics: HA, Robustness, Consistency, Contrastivity.
Compares LIME, SHAP, LRP, AMV across LLMs and tasks.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Novel metrics for XAI effectiveness evaluation
Framework assesses five XAI techniques on LLMs
Focus on HA, Robustness, Consistency, Contrastivity
🔎 Similar Papers
No similar papers found.
Melkamu Abay Mersha
Melkamu Abay Mersha
PhD Candidate, University of Colorado Colorado Springs
AIMachine learningXAINLP
M
M. Yigezu
Instituto Politécnico Nacional (IPN), Centro de Investigación en Computación (CIC), 07738, Mexico city, Mexico
Hassan Shakil
Hassan Shakil
Ph.D. Candidate in Computer Science, University of Colorado Colorado Springs (UCCS)
Large Language ModelsNatural Language Processing
A
Ali Al shami
College of Engineering and Applied Science, University of Colorado Colorado Springs, 80918, CO, USA
S
S. Byun
College of Engineering and Applied Science, University of Colorado Colorado Springs, 80918, CO, USA
J
Jugal K. Kalita
College of Engineering and Applied Science, University of Colorado Colorado Springs, 80918, CO, USA