🤖 AI Summary
Existing reinforcement learning (RL) approaches for virtual machine scheduling (VMS) in large-scale cloud environments suffer from poor scalability, typically supporting only ≤10 physical machines.
Method: We propose CVD-RL, a scalable multi-agent RL framework featuring three key innovations: (i) Cluster Value Decomposition—a novel value factorization scheme that decomposes global cluster-level value into coordinated agent-specific components; (ii) temporal forward prediction to enhance long-horizon decision consistency; and (iii) a Top-k sparse action selection operator to accelerate exploration and improve generalization.
Results: Evaluated on a real-world 50-node cluster, CVD-RL achieves a 19.7% improvement in resource utilization and a 23.4% reduction in task completion latency over state-of-the-art methods. It is the first RL-based scheduler to demonstrate efficient and robust operation at the hundred-machine scale, with strong cross-scenario generalization capability.
📝 Abstract
Recent advancements in reinforcement learning (RL) have shown promise for optimizing virtual machine scheduling (VMS) in small-scale clusters. The utilization of RL to large-scale cloud computing scenarios remains notably constrained. This paper introduces a scalable RL framework, called Cluster Value Decomposition Reinforcement Learning (CVD-RL), to surmount the scalability hurdles inherent in large-scale VMS. The CVD-RL framework innovatively combines a decomposition operator with a look-ahead operator to adeptly manage representation complexities, while complemented by a Top-$k$ filter operator that refines exploration efficiency. Different from existing approaches limited to clusters of $10$ or fewer physical machines (PMs), CVD-RL extends its applicability to environments encompassing up to $50$ PMs. Furthermore, the CVD-RL framework demonstrates generalization capabilities that surpass contemporary SOTA methodologies across a variety of scenarios in empirical studies. This breakthrough not only showcases the framework's exceptional scalability and performance but also represents a significant leap in the application of RL for VMS within complex, large-scale cloud infrastructures. The code is available at https://anonymous.4open.science/r/marl4sche-D0FE.