🤖 AI Summary
Existing model-agnostic feature importance methods struggle to characterize high-order interactions and disentangle overlapping contributions among features. To address this, we propose a higher-order feature effect decomposition framework grounded in conditional mutual information (CMI), which rigorously decomposes feature contributions into three distinct information components: uniqueness, synergy, and redundancy. Our method employs a k-nearest-neighbor estimator to uniformly estimate CMI for mixed-type (discrete and continuous) variables, ensuring model-agnostic interpretability. On synthetic Gaussian and non-Gaussian datasets, the framework accurately recovers ground-truth interaction structures; it further demonstrates robustness on real-world TCGA-BRCA gene expression data. Results show that the approach effectively supports interaction-aware feature selection and interpretable modeling. By formalizing feature dependencies through an information-theoretic lens, our work establishes a novel paradigm for modeling higher-order statistical dependencies.
📝 Abstract
Understanding the contribution of individual features in predictive models remains a central goal in interpretable machine learning, and while many model-agnostic methods exist to estimate feature importance, they often fall short in capturing high-order interactions and disentangling overlapping contributions. In this work, we present an information-theoretic extension of the High-order interactions for Feature importance (Hi-Fi) method, leveraging Conditional Mutual Information (CMI) estimated via a k-Nearest Neighbor (kNN) approach working on mixed discrete and continuous random variables. Our framework decomposes feature contributions into unique, synergistic, and redundant components, offering a richer, model-independent understanding of their predictive roles. We validate the method using synthetic datasets with known Gaussian structures, where ground truth interaction patterns are analytically derived, and further test it on non-Gaussian and real-world gene expression data from TCGA-BRCA. Results indicate that the proposed estimator accurately recovers theoretical and expected findings, providing a potential use case for developing feature selection algorithms or model development based on interaction analysis.