🤖 AI Summary
This paper addresses the sequential repair scheduling problem for multi-agent monotonic MDPs under dual hard constraints: a global budget (total number of repairs) and a capacity limit (maximum parallel repairs). The combinatorial complexity grows exponentially with agent scale. We propose a novel two-stage decoupling framework: Stage I formulates component grouping and budget allocation as a capacity-aware linear sum assignment problem (LSAP); Stage II employs meta-trained Proximal Policy Optimization (PPO) to learn efficient, decentralized repair policies for each subgroup. To our knowledge, this is the first approach that jointly enforces both hard constraints in monotonic MDPs while preserving theoretical interpretability and computational scalability. Evaluated on industrial robotic cluster maintenance tasks, our method improves average uptime by 23.5% in large-scale scenarios (>100 robots), significantly outperforming baseline methods.
📝 Abstract
Many real-world sequential repair problems can be effectively modeled using monotonic Markov Decision Processes (MDPs), where the system state stochastically decreases and can only be increased by performing a restorative action. This work addresses the problem of solving multi-component monotonic MDPs with both budget and capacity constraints. The budget constraint limits the total number of restorative actions and the capacity constraint limits the number of restorative actions that can be performed simultaneously. While prior methods dealt with budget constraints, including capacity constraints in prior methods leads to an exponential increase in computational complexity as the number of components in the MDP grows. We propose a two-step planning approach to address this challenge. First, we partition the components of the multi-component MDP into groups, where the number of groups is determined by the capacity constraint. We achieve this partitioning by solving a Linear Sum Assignment Problem (LSAP). Each group is then allocated a fraction of the total budget proportional to its size. This partitioning effectively decouples the large multi-component MDP into smaller subproblems, which are computationally feasible because the capacity constraint is simplified and the budget constraint can be addressed using existing methods. Subsequently, we use a meta-trained PPO agent to obtain an approximately optimal policy for each group. To validate our approach, we apply it to the problem of scheduling repairs for a large group of industrial robots, constrained by a limited number of repair technicians and a total repair budget. Our results demonstrate that the proposed method outperforms baseline approaches in terms of maximizing the average uptime of the robot swarm, particularly for large swarm sizes.