Model-Based Learning of Whittle indices

📅 2025-11-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the challenge of efficiently learning Whittle indices for indexable, connected, and single-chain Markov decision processes (MDPs). We propose BLINQ—the first model-based method that jointly designs model learning and Whittle index computation. BLINQ constructs an empirical MDP and extends classical Whittle index algorithms, enabling index learning without neural networks. We establish theoretical convergence guarantees and derive a rigorous upper bound on learning time. Compared to existing Q-learning approaches, BLINQ reduces sample complexity by multiple-fold and significantly lowers total computational cost—even when Q-learning leverages pretrained neural networks for acceleration. BLINQ thus provides a new, efficient, and interpretable paradigm for restless multi-armed bandit (RMAB) problems under resource constraints.

Technology Category

Application Category

📝 Abstract
We present BLINQ, a new model-based algorithm that learns the Whittle indices of an indexable, communicating and unichain Markov Decision Process (MDP). Our approach relies on building an empirical estimate of the MDP and then computing its Whittle indices using an extended version of a state-of-the-art existing algorithm. We provide a proof of convergence to the Whittle indices we want to learn as well as a bound on the time needed to learn them with arbitrary precision. Moreover, we investigate its computational complexity. Our numerical experiments suggest that BLINQ significantly outperforms existing Q-learning approaches in terms of the number of samples needed to get an accurate approximation. In addition, it has a total computational cost even lower than Q-learning for any reasonably high number of samples. These observations persist even when the Q-learning algorithms are speeded up using pre-trained neural networks to predict Q-values.
Problem

Research questions and friction points this paper is trying to address.

Learning Whittle indices for indexable communicating unichain MDPs
Providing convergence guarantees and complexity bounds for learning
Outperforming Q-learning approaches in sample efficiency and computational cost
Innovation

Methods, ideas, or system contributions that make the work stand out.

Model-based algorithm learns Whittle indices
Builds empirical MDP estimate for computation
Outperforms Q-learning in sample efficiency
🔎 Similar Papers
No similar papers found.
J
Joël Charles-Rebuffé
Univ. Grenoble Alpes, Inria, CNRS, Grenoble INP, France
Nicolas Gast
Nicolas Gast
Univ. Grenoble Alpes, Inria, CNRS, Grenoble INP, France
Bruno Gaujal
Bruno Gaujal
Unknown affiliation