🤖 AI Summary
In social networks, interference—where treatments assigned to one user affect outcomes of others—induces substantial bias in causal effect estimation and high policy regret in conventional randomized controlled trials (RCTs) or A/B tests.
Method: This paper proposes a community-aware online multi-armed bandit algorithm that explicitly incorporates network structural priors (i.e., community partitions) into the bandit framework. It enables efficient online decision-making while preserving causal identifiability through cluster-aware action selection and semi-synthetic interference modeling.
Contribution/Results: In simulation experiments, the proposed algorithm reduces treatment effect estimation error by 37% compared to structure-agnostic baseline bandit methods, and improves the reward-to-action ratio by 21% over RCTs. By jointly optimizing for high-fidelity causal estimation and cumulative reward maximization, it effectively alleviates the bias–reward trade-off inherent in interference-prone settings.
📝 Abstract
The gold standard for estimating causal effects is randomized controlled trial (RCT) or A/B testing where a random group of individuals from a population of interest are given treatment and the outcome is compared to a random group of individuals from the same population. However, A/B testing is challenging in the presence of interference, commonly occurring in social networks, where individuals can impact each others outcome. Moreover, A/B testing can incur a high performance loss when one of the treatment arms has a poor performance and the test continues to treat individuals with it. Therefore, it is important to design a strategy that can adapt over time and efficiently learn the total treatment effect in the network. We introduce two cluster-based multi-armed bandit (MAB) algorithms to gradually estimate the total treatment effect in a network while maximizing the expected reward by making a tradeoff between exploration and exploitation. We compare the performance of our MAB algorithms with a vanilla MAB algorithm that ignores clusters and the corresponding RCT methods on semi-synthetic data with simulated interference. The vanilla MAB algorithm shows higher reward-action ratio at the cost of higher treatment effect error due to undesired spillover. The cluster-based MAB algorithms show higher reward-action ratio compared to their corresponding RCT methods without sacrificing much accuracy in treatment effect estimation.