🤖 AI Summary
This study addresses the limited reliability and diagnostic capability of existing AutoClustering systems, which stem from a lack of interpretability regarding how meta-features influence the selection of clustering algorithms and hyperparameters. For the first time, the work systematically reviews the meta-features employed across 22 AutoClustering methods and organizes them into a coherent taxonomy. By integrating global interpretability through decision predicate graphs and local interpretability via SHAP values, the authors conduct a thorough analysis of feature contributions within meta-models. Their investigation uncovers structural biases and consistent patterns in current meta-learning strategies, revealing fundamental limitations of prevailing approaches. These insights not only expose critical shortcomings but also offer actionable interpretability guidelines for designing transparent and trustworthy unsupervised AutoML systems.
📝 Abstract
AutoClustering methods aim to automate unsupervised learning tasks, including algorithm selection (AS), hyperparameter optimization (HPO), and pipeline synthesis (PS), by often leveraging meta-learning over dataset meta-features. While these systems often achieve strong performance, their recommendations are often difficult to justify: the influence of dataset meta-features on algorithm and hyperparameter choices is typically not exposed, limiting reliability, bias diagnostics, and efficient meta-feature engineering. This limits reliability and diagnostic insight for further improvements. In this work, we investigate the explainability of the meta-models in AutoClustering. We first review 22 existing methods and organize their meta-features into a structured taxonomy. We then apply a global explainability technique (i.e., Decision Predicate Graphs) to assess feature importance within meta-models from selected frameworks. Finally, we use local explainability tools such as SHAP (SHapley Additive exPlanations) to analyse specific clustering decisions. Our findings highlight consistent patterns in meta-feature relevance, identify structural weaknesses in current meta-learning strategies that can distort recommendations, and provide actionable guidance for more interpretable Automated Machine Learning (AutoML) design. This study therefore offers a practical foundation for increasing decision transparency in unsupervised learning automation.