๐ค AI Summary
In contextual multi-armed bandits (CMAB), existing high-dimensional feature selection methods rely solely on statistical correlation, neglecting heterogeneous treatment effects (HTE)โa critical causal quantity for optimal decision-makingโleading to suboptimal rewards, poor interpretability, and computational inefficiency.
Method: We propose the first model-agnostic causal feature selection framework for CMAB, using HTE on the reward distribution as the selection criterion, thereby departing from conventional correlation-driven paradigms. Our approach integrates doubly robust estimation with causal importance scoring, requiring no assumptions about the underlying reward model.
Contribution/Results: Evaluated on synthetic data and real-world online recommendation experiments (cover image optimization), our method significantly improves cumulative reward. It outperforms embedded approaches in both computational efficiency and implementation simplicity, while eliminating risks associated with model misspecification.
๐ Abstract
Features (a.k.a. context) are critical for contextual multi-armed bandits (MAB) performance. In practice of large scale online system, it is important to select and implement important features for the model: missing important features can led to sub-optimal reward outcome, and including irrelevant features can cause overfitting, poor model interpretability, and implementation cost. However, feature selection methods for conventional machine learning models fail short for contextual MAB use cases, as conventional methods select features correlated with the outcome variable, but not necessarily causing heterogeneuous treatment effect among arms which are truely important for contextual MAB. In this paper, we introduce model-free feature selection methods designed for contexutal MAB problem, based on heterogeneous causal effect contributed by the feature to the reward distribution. Empirical evaluation is conducted based on synthetic data as well as real data from an online experiment for optimizing content cover image in a recommender system. The results show this feature selection method effectively selects the important features that lead to higher contextual MAB reward than unimportant features. Compared with model embedded method, this model-free method has advantage of fast computation speed, ease of implementation, and prune of model mis-specification issues.