Towards Transparent AI: A Survey on Explainable Large Language Models

📅 2025-06-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) face trust bottlenecks in high-stakes applications due to the opacity of their decision-making processes. Method: This work introduces the first architecture-aware taxonomy of eXplainable AI (XAI) methods for LLMs—systematically categorizing techniques for encoder-only, decoder-only, and encoder-decoder architectures based on Transformer fundamentals—and constructs the first comprehensive interpretability landscape tailored to LLM architectures. It further proposes a cross-architectural explanation–application mapping framework with structured evaluation metrics. Contribution/Results: The study surveys and analyzes over 120 XAI methods, uncovering intrinsic relationships among model architecture, explanation methodology, and evaluation criteria. It fills a critical gap in systematic LLM interpretability research and provides both theoretical foundations and practical guidelines for developing transparent, trustworthy, and accountable next-generation foundation models.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) have played a pivotal role in advancing Artificial Intelligence (AI). However, despite their achievements, LLMs often struggle to explain their decision-making processes, making them a 'black box' and presenting a substantial challenge to explainability. This lack of transparency poses a significant obstacle to the adoption of LLMs in high-stakes domain applications, where interpretability is particularly essential. To overcome these limitations, researchers have developed various explainable artificial intelligence (XAI) methods that provide human-interpretable explanations for LLMs. However, a systematic understanding of these methods remains limited. To address this gap, this survey provides a comprehensive review of explainability techniques by categorizing XAI methods based on the underlying transformer architectures of LLMs: encoder-only, decoder-only, and encoder-decoder models. Then these techniques are examined in terms of their evaluation for assessing explainability, and the survey further explores how these explanations are leveraged in practical applications. Finally, it discusses available resources, ongoing research challenges, and future directions, aiming to guide continued efforts toward developing transparent and responsible LLMs.
Problem

Research questions and friction points this paper is trying to address.

LLMs lack transparency in decision-making processes
Need for explainable AI methods in high-stakes domains
Systematic review of XAI techniques for LLM architectures
Innovation

Methods, ideas, or system contributions that make the work stand out.

Categorizes XAI methods by transformer architectures
Evaluates explainability techniques systematically
Explores practical applications of XAI explanations
🔎 Similar Papers
No similar papers found.