Causal Feature Selection Method for Contextual Multi-Armed Bandits in Recommender System

📅 2024-09-20

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

228K/year

🤖 AI Summary

In contextual multi-armed bandits (CMAB), existing high-dimensional feature selection methods rely solely on statistical correlation, neglecting heterogeneous treatment effects (HTE)—a critical causal quantity for optimal decision-making—leading to suboptimal rewards, poor interpretability, and computational inefficiency. Method: We propose the first model-agnostic causal feature selection framework for CMAB, using HTE on the reward distribution as the selection criterion, thereby departing from conventional correlation-driven paradigms. Our approach integrates doubly robust estimation with causal importance scoring, requiring no assumptions about the underlying reward model. Contribution/Results: Evaluated on synthetic data and real-world online recommendation experiments (cover image optimization), our method significantly improves cumulative reward. It outperforms embedded approaches in both computational efficiency and implementation simplicity, while eliminating risks associated with model misspecification.

Technology Category

Application Category

📝 Abstract

Features (a.k.a. context) are critical for contextual multi-armed bandits (MAB) performance. In practice of large scale online system, it is important to select and implement important features for the model: missing important features can led to sub-optimal reward outcome, and including irrelevant features can cause overfitting, poor model interpretability, and implementation cost. However, feature selection methods for conventional machine learning models fail short for contextual MAB use cases, as conventional methods select features correlated with the outcome variable, but not necessarily causing heterogeneuous treatment effect among arms which are truely important for contextual MAB. In this paper, we introduce model-free feature selection methods designed for contexutal MAB problem, based on heterogeneous causal effect contributed by the feature to the reward distribution. Empirical evaluation is conducted based on synthetic data as well as real data from an online experiment for optimizing content cover image in a recommender system. The results show this feature selection method effectively selects the important features that lead to higher contextual MAB reward than unimportant features. Compared with model embedded method, this model-free method has advantage of fast computation speed, ease of implementation, and prune of model mis-specification issues.

Problem

Research questions and friction points this paper is trying to address.

Identifies features driving heterogeneous treatment effects in bandits

Enhances contextual multi-armed bandit performance in recommender systems

Provides efficient model-free feature selection for dynamic environments

Innovation

Methods, ideas, or system contributions that make the work stand out.

Model-free filter methods for feature selection

HIE quantifies feature value by optimal arm changes

HDD measures reward distribution divergence across arms

🔎 Similar Papers

No similar papers found.

TikTok

San Jose, California

Research Engineer, Monetization AI