🤖 AI Summary
This study addresses the challenge of providing actionable explanations of AI/ML models in network operations to non-expert users, a limitation of existing explainable artificial intelligence (XAI) methods. To overcome this, the authors propose a novel approach that integrates interaction information among features with SHAP values and injects them into a medium-scale large language model (LLM) via structured prompts to generate human-interpretable natural language explanations. Evaluated on an optical transmission quality estimation task, the method significantly enhances both the utility and coverage of XAI for non-specialist users, achieving a 12.2% improvement in explanation usefulness and a 6.2% expansion in coverage compared to baseline approaches, while maintaining a high prediction accuracy of 97.5%.
📝 Abstract
As artificial intelligence and machine learning (AI/ML) models become integral to network operations, their lack of transparency poses a significant barrier to operator trust. Existing explainable artificial intelligence (XAI) techniques often fail to bridge this gap for non-specialists, producing technical outputs that are difficult to translate into actionable insights. This paper presents a framework specifically designed to address this shortcoming. It leverages a moderately sized large language model (LLM) and extends beyond the standard use of SHapley Additive exPlanations (SHAP) feature influence values. The framework employs a structured prompt enriched with mutual feature interaction data to generate human-understandable natural language explanations. To validate our framework, we performed an empirical evaluation on an optical quality of transmission (QoT) estimation use case with human evaluators. We collected independent performance evaluations from specialists, which showed a high inter-evaluator agreement. Compared to a state-of-the-art baseline that uses only SHAP feature influence values in a straightforward prompt, our approach improves the explanation usefulness and scope by 12.2% and 6.2%, while achieving 97.5% correctness.