Graph-Dependent Regret Bounds in Multi-Armed Bandits with Interference

📅 2025-03-10

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This paper studies the multi-armed bandit (MAB) problem under network interference: individual rewards depend not only on the agent’s own action but also on actions taken by neighbors in an interference graph—leading to exponential blowup of the action space and failure of classical methods. To address this, we propose the first graph-structure-dependent theoretical framework for regret upper and lower bounds. Our method introduces a network-aware online algorithm grounded in local neighborhood analysis and adaptive confidence intervals, explicitly leveraging graph topology to reduce regret. We prove that the algorithm achieves a near-optimal, graph-dependent upper bound on regret and establish the first general information-theoretic lower bound applicable to arbitrary interference graphs. Experiments on diverse real-world and synthetic networks demonstrate that our approach significantly outperforms existing baseline methods.

Technology Category

Application Category

📝 Abstract

Multi-armed bandits (MABs) are frequently used for online sequential decision-making in applications ranging from recommending personalized content to assigning treatments to patients. A recurring challenge in the applicability of the classic MAB framework to real-world settings is ignoring extit{interference}, where a unit's outcome depends on treatment assigned to others. This leads to an exponentially growing action space, rendering standard approaches computationally impractical. We study the MAB problem under network interference, where each unit's reward depends on its own treatment and those of its neighbors in a given interference graph. We propose a novel algorithm that uses the local structure of the interference graph to minimize regret. We derive a graph-dependent upper bound on cumulative regret showing that it improves over prior work. Additionally, we provide the first lower bounds for bandits with arbitrary network interference, where each bound involves a distinct structural property of the interference graph. These bounds demonstrate that when the graph is either dense or sparse, our algorithm is nearly optimal, with upper and lower bounds that match up to logarithmic factors. We complement our theoretical results with numerical experiments, which show that our approach outperforms baseline methods.

Problem

Research questions and friction points this paper is trying to address.

Addresses interference in multi-armed bandits with network effects.

Proposes algorithm using interference graph structure to minimize regret.

Provides graph-dependent upper and lower bounds on cumulative regret.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Novel algorithm uses local graph structure

Graph-dependent upper bound on regret

First lower bounds for network interference

🔎 Similar Papers

No similar papers found.

Authors to Follow