Explainability of Large Language Models using SMILE: Statistical Model-agnostic Interpretability with Local Explanations

📅 2025-05-27

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

To address the opacity and lack of interpretability in large language model (LLM) decision-making, this paper proposes SMILE—a model-agnostic, gradient-free local attribution method. SMILE quantifies token-level importance via statistically driven input perturbation and output sensitivity analysis, requiring no access to model parameters or gradients, and generates human-readable heatmaps highlighting salient components in prompts. The method is rigorously evaluated across four dimensions: accuracy, consistency, stability, and fidelity. It introduces, for the first time, a lightweight explanation framework grounded in statistical significance testing and weighted aggregation. Extensive experiments on mainstream models—including GPT, LLaMA, and Claude—demonstrate that SMILE consistently outperforms existing baselines, achieving 12–28% improvements across multiple evaluation metrics. This advancement significantly enhances LLM trustworthiness and debuggability.

Technology Category

Application Category

📝 Abstract

Large language models like GPT, LLAMA, and Claude have become incredibly powerful at generating text, but they are still black boxes, so it is hard to understand how they decide what to say. That lack of transparency can be problematic, especially in fields where trust and accountability matter. To help with this, we introduce SMILE, a new method that explains how these models respond to different parts of a prompt. SMILE is model-agnostic and works by slightly changing the input, measuring how the output changes, and then highlighting which words had the most impact. Create simple visual heat maps showing which parts of a prompt matter the most. We tested SMILE on several leading LLMs and used metrics such as accuracy, consistency, stability, and fidelity to show that it gives clear and reliable explanations. By making these models easier to understand, SMILE brings us one step closer to making AI more transparent and trustworthy.

Problem

Research questions and friction points this paper is trying to address.

Explaining black-box decision-making in large language models

Providing model-agnostic interpretability for AI-generated text

Enhancing transparency and trust in LLM outputs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Model-agnostic interpretability with local explanations

Input perturbation for impact measurement

Visual heat maps highlighting key words

🔎 Similar Papers

No similar papers found.

Authors to Follow