Simplifying Outcomes of Language Model Component Analyses with ELIA

📅 2026-02-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limited accessibility of existing interpretability tools for language models, which are often too complex for non-expert users to effectively utilize. To bridge this gap, the authors introduce ELIA, an interactive web application that uniquely integrates multiple mechanistic analysis techniques—including Attribution Analysis, Function Vector Analysis, and Circuit Tracing—and innovatively incorporates a vision–language model to automatically generate natural language explanations for complex visualizations. User studies demonstrate that ELIA significantly lowers the barrier to understanding model behavior, with AI-generated explanations effectively mitigating knowledge disparities. Notably, users’ comprehension outcomes show no significant correlation with their prior experience using large language models, underscoring the system’s broad usability and general effectiveness across diverse user backgrounds.

Technology Category

Application Category

📝 Abstract
While mechanistic interpretability has developed powerful tools to analyze the internal workings of Large Language Models (LLMs), their complexity has created an accessibility gap, limiting their use to specialists. We address this challenge by designing, building, and evaluating ELIA (Explainable Language Interpretability Analysis), an interactive web application that simplifies the outcomes of various language model component analyses for a broader audience. The system integrates three key techniques -- Attribution Analysis, Function Vector Analysis, and Circuit Tracing -- and introduces a novel methodology: using a vision-language model to automatically generate natural language explanations (NLEs) for the complex visualizations produced by these methods. The effectiveness of this approach was empirically validated through a mixed-methods user study, which revealed a clear preference for interactive, explorable interfaces over simpler, static visualizations. A key finding was that the AI-powered explanations helped bridge the knowledge gap for non-experts; a statistical analysis showed no significant correlation between a user's prior LLM experience and their comprehension scores, suggesting that the system reduced barriers to comprehension across experience levels. We conclude that an AI system can indeed simplify complex model analyses, but its true power is unlocked when paired with thoughtful, user-centered design that prioritizes interactivity, specificity, and narrative guidance.
Problem

Research questions and friction points this paper is trying to address.

mechanistic interpretability
Large Language Models
accessibility gap
non-experts
model analysis
Innovation

Methods, ideas, or system contributions that make the work stand out.

Explainable AI
Natural Language Explanations
Interactive Visualization
Mechanistic Interpretability
Vision-Language Models
🔎 Similar Papers
No similar papers found.
A
Aaron Louis Eidt
Technische Universität Berlin, Fraunhofer Heinrich Hertz Institute
Nils Feldhus
Nils Feldhus
TU Berlin, BIFOLD, DFKI (Guest)
Natural Language ProcessingInterpretabilityExplainable AI