A Unified Framework with Novel Metrics for Evaluating the Effectiveness of XAI Techniques in LLMs

📅 2025-03-06

📈 Citations: 0

✨ Influential: 0

career value

239K/year

🤖 AI Summary

This study addresses the lack of a unified framework for evaluating the interpretability of large language models (LLMs). We propose a four-dimensional quantitative evaluation framework—covering transparency, robustness, consistency, and contrastivity—and introduce four novel metrics, including Human-reasoning Agreement. For the first time, we conduct a systematic, cross-model (e.g., LLaMA, ChatGLM) and cross-task (sentiment analysis, text classification) empirical comparison of five XAI methods—LIME, SHAP, Integrated Gradients, LRP, and attention visualization—on the IMDB and Tweet Sentiment datasets. Results show that LIME achieves the best overall performance; AMV excels in robustness and consistency; and LRP demonstrates the strongest contrastivity on complex models. This work establishes the first empirically grounded, model- and task-agnostic benchmark for XAI method selection, providing both practical guidance and theoretical foundations for interpretability assessment in LLMs.

Technology Category

Application Category

📝 Abstract

The increasing complexity of LLMs presents significant challenges to their transparency and interpretability, necessitating the use of eXplainable AI (XAI) techniques to enhance trustworthiness and usability. This study introduces a comprehensive evaluation framework with four novel metrics for assessing the effectiveness of five XAI techniques across five LLMs and two downstream tasks. We apply this framework to evaluate several XAI techniques LIME, SHAP, Integrated Gradients, Layer-wise Relevance Propagation (LRP), and Attention Mechanism Visualization (AMV) using the IMDB Movie Reviews and Tweet Sentiment Extraction datasets. The evaluation focuses on four key metrics: Human-reasoning Agreement (HA), Robustness, Consistency, and Contrastivity. Our results show that LIME consistently achieves high scores across multiple LLMs and evaluation metrics, while AMV demonstrates superior Robustness and near-perfect Consistency. LRP excels in Contrastivity, particularly with more complex models. Our findings provide valuable insights into the strengths and limitations of different XAI methods, offering guidance for developing and selecting appropriate XAI techniques for LLMs.

Problem

Research questions and friction points this paper is trying to address.

Evaluates XAI techniques for LLM transparency and interpretability.

Introduces novel metrics: HA, Robustness, Consistency, Contrastivity.

Compares LIME, SHAP, LRP, AMV across LLMs and tasks.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Novel metrics for XAI effectiveness evaluation

Framework assesses five XAI techniques on LLMs

Focus on HA, Robustness, Consistency, Contrastivity

🔎 Similar Papers

No similar papers found.