π€ AI Summary
This work addresses the problem of determining whether the root value in a Monte Carlo tree search exceeds a given threshold, where internal nodes alternate between MAX and MIN operations and leaf node values correspond to the means of unknown distributions. To tackle this, the authors propose a Ξ΄-correct sequential sampling algorithm built upon the Track-and-Stop framework, featuring an innovative ratio-corrected D-Tracking strategy for arm selection. The method preserves asymptotic optimality in sample complexity while substantially reducing the actual number of samples required in practice. Furthermore, it improves computational efficiency by lowering the per-round time complexity from linear to logarithmic. Empirical evaluations demonstrate the algorithmβs dual advantages in both sample efficiency and computational speed.
π Abstract
We introduce the Thresholding Monte Carlo Tree Search problem, in which, given a tree $\mathcal{T}$ and a threshold $\theta$, a player must answer whether the root node value of $\mathcal{T}$ is at least $\theta$ or not. In the given tree, `MAX'or `MIN'is labeled on each internal node, and the value of a `MAX'-labeled (`MIN'-labeled) internal node is the maximum (minimum) of its child values. The value of a leaf node is the mean reward of an unknown distribution, from which the player can sample rewards. For this problem, we develop a $\delta$-correct sequential sampling algorithm based on the Track-and-Stop strategy that has asymptotically optimal sample complexity. We show that a ratio-based modification of the D-Tracking arm-pulling strategy leads to a substantial improvement in empirical sample complexity, as well as reducing the per-round computational cost from linear to logarithmic in the number of arms.