🤖 AI Summary
Existing text interpretability methods are constrained by reliance on manual concept annotation or yield implicit, human-incomprehensible concepts, hindering unsupervised and trustworthy concept discovery. To address this, we propose the first unsupervised, text-oriented explainable AI framework. Our method introduces an object-centric neural architecture to automatically disentangle semantic concepts from raw text, and leverages large language models (LLMs) as *interpretability discriminators*—assessing concept clarity and human readability—within a feedback-driven reinforcement fine-tuning loop that dynamically refines concept quality. Evaluated across diverse multi-task benchmarks, our approach significantly outperforms existing state-of-the-art methods. Both human evaluations and automated metrics confirm that the discovered concepts exhibit superior interpretability, consistency, and user trustworthiness—thereby overcoming the dual bottlenecks of *uncontrollability* and *untrustworthiness* in concept-based interpretability.
📝 Abstract
Concept-based explainable approaches have emerged as a promising method in explainable AI because they can interpret models in a way that aligns with human reasoning. However, their adaption in the text domain remains limited. Most existing methods rely on predefined concept annotations and cannot discover unseen concepts, while other methods that extract concepts without supervision often produce explanations that are not intuitively comprehensible to humans, potentially diminishing user trust. These methods fall short of discovering comprehensible concepts automatically. To address this issue, we propose extbf{ECO-Concept}, an intrinsically interpretable framework to discover comprehensible concepts with no concept annotations. ECO-Concept first utilizes an object-centric architecture to extract semantic concepts automatically. Then the comprehensibility of the extracted concepts is evaluated by large language models. Finally, the evaluation result guides the subsequent model fine-tuning to obtain more understandable explanations. Experiments show that our method achieves superior performance across diverse tasks. Further concept evaluations validate that the concepts learned by ECO-Concept surpassed current counterparts in comprehensibility.