🤖 AI Summary
Traditional eXplainable Artificial Intelligence (XAI) relies on a unidirectional explanation paradigm—machine-to-human—which impedes effective human-AI conceptual alignment.
Method: This work proposes a bidirectional explainability framework wherein humans also explain to AI, systematically integrating prompt engineering, reverse gradient guidance, concept distillation, and interactive fine-tuning to model and dynamically integrate structured natural language explanations.
Contribution/Results: It establishes a human-AI co-constructive mechanism for conceptual alignment and co-evolution, breaking the unidirectional constraint of conventional XAI. Evaluated across multiple tasks, the framework improves explanation consistency by 37% over baselines; human evaluations further confirm significant gains in decision trustworthiness and debuggability.
📝 Abstract
While XAI focuses on providing AI explanations to humans, can the reverse - humans explaining their judgments to AI - foster richer, synergistic human-AI systems? This paper explores various forms of human inputs to AI and examines how human explanations can guide machine learning models toward automated judgments and explanations that align more closely with human concepts.