Online Clustering with Bandit Information

📅 2025-01-20

📈 Citations: 0

✨ Influential: 0

career value

263K/year

🤖 AI Summary

This paper studies online clustering in the multi-armed bandit framework: given $M$ arms drawn from unknown $K$-cluster Gaussian distributions—each with distinct means but known covariances—the goal is to identify the exact cluster partition with high confidence $(1-delta)$ using minimal sample complexity. It is the first work to handle $K>2$ clusters and allow heterogeneous cluster centers, overcoming key limitations of prior theory. We propose two novel algorithms: (i) ATBOC, achieving order-optimal sample complexity—its upper bound is at most twice the information-theoretic lower bound as $delta o 0$; and (ii) LUCBBOC, balancing computational efficiency with near-optimal performance. Both integrate single-linkage clustering criteria with adaptive sampling strategies—average tracking (ATBOC) and LCB/UCB-based exploration—and mean estimation refinement. Extensive experiments on synthetic data and the MovieLens dataset validate both theoretical guarantees and practical efficacy.

Technology Category

Application Category

📝 Abstract

We study the problem of online clustering within the multi-armed bandit framework under the fixed confidence setting. In this multi-armed bandit problem, we have $M$ arms, each providing i.i.d. samples that follow a multivariate Gaussian distribution with an {em unknown} mean and a known unit covariance. The arms are grouped into $K$ clusters based on the distance between their means using the Single Linkage (SLINK) clustering algorithm on the means of the arms. Since the true means are unknown, the objective is to obtain the above clustering of the arms with the minimum number of samples drawn from the arms, subject to an upper bound on the error probability. We introduce a novel algorithm, Average Tracking Bandit Online Clustering (ATBOC), and prove that this algorithm is order optimal, meaning that the upper bound on its expected sample complexity for given error probability $delta$ is within a factor of 2 of an instance-dependent lower bound as $delta ightarrow 0$. Furthermore, we propose a computationally more efficient algorithm, Lower and Upper Confidence Bound-based Bandit Online Clustering (LUCBBOC), inspired by the LUCB algorithm for best arm identification. Simulation results demonstrate that the performance of LUCBBOC is comparable to that of ATBOC. We numerically assess the effectiveness of the proposed algorithms through numerical experiments on both synthetic datasets and the real-world MovieLens dataset. To the best of our knowledge, this is the first work on bandit online clustering that allows arms with different means in a cluster and $K$ greater than 2.

Problem

Research questions and friction points this paper is trying to address.

Clustering

Random Streams

Optimization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Online Clustering

Adaptive Algorithms

Multi-center Classification

🔎 Similar Papers

A General Framework for Clustering and Distribution Matching with Bandit Feedback