🤖 AI Summary
Existing automatic humor style identification models (e.g., ALI+XGBoost) in mental health applications suffer from poor interpretability, hindering clinical trust and deployment.
Method: This study pioneers the systematic integration of eXplainable AI (XAI) techniques—including SHAP, LIME, error pattern analysis, and fine-grained case attribution—to dissect the model’s decision logic. We jointly analyze linguistic, affective, and semantic features to identify determinants of affiliative, aggressive, and other humor styles.
Contribution/Results: We identify affective ambiguity, contextual misinterpretation, and target misidentification as primary error sources; notably, affiliative humor poses greater interpretability challenges due to its high semantic implicitness. Our approach yields high-fidelity feature importance rankings and traceable, instance-level misclassification attributions. This significantly enhances model transparency and provides a trustworthy AI foundation for mental health interventions, content safety moderation, and digital humanities research.
📝 Abstract
Humour styles can have either a negative or a positive impact on well-being. Given the importance of these styles to mental health, significant research has been conducted on their automatic identification. However, the automated machine learning models used for this purpose are black boxes, making their prediction decisions opaque. Clarity and transparency are vital in the field of mental health. This paper presents an explainable AI (XAI) framework for understanding humour style classification, building upon previous work in computational humour analysis. Using the best-performing single model (ALI+XGBoost) from prior research, we apply comprehensive XAI techniques to analyse how linguistic, emotional, and semantic features contribute to humour style classification decisions. Our analysis reveals distinct patterns in how different humour styles are characterised and misclassified, with particular emphasis on the challenges in distinguishing affiliative humour from other styles. Through detailed examination of feature importance, error patterns, and misclassification cases, we identify key factors influencing model decisions, including emotional ambiguity, context misinterpretation, and target identification. The framework demonstrates significant utility in understanding model behaviour, achieving interpretable insights into the complex interplay of features that define different humour styles. Our findings contribute to both the theoretical understanding of computational humour analysis and practical applications in mental health, content moderation, and digital humanities research.