🤖 AI Summary
Traditional Estimation-of-Distribution Algorithms (EDAs) for black-box combinatorial optimization rely on explicit variable dependency graphs, making them ill-suited for modeling complex, high-order interactions among variables. Method: This paper proposes a permutation-invariant reinforcement learning framework that trains an autoregressive generative model on randomly permuted variable sequences; sequence randomization acts as an information-preserving dropout to enforce permutation invariance. It integrates Generalized Reinforcement Policy Optimization (GRPO) with a scale-invariant advantage function, eliminating assumptions about fixed variable ordering. Contribution/Results: By bypassing explicit dependency graph learning, the method significantly reduces computational overhead and improves sample efficiency and search robustness. It achieves state-of-the-art performance across diverse benchmark algorithms and problem scales, effectively mitigating catastrophic failure.
📝 Abstract
We introduce an order-invariant reinforcement learning framework for black-box combinatorial optimization. Classical estimation-of-distribution algorithms (EDAs) often rely on learning explicit variable dependency graphs, which can be costly and fail to capture complex interactions efficiently. In contrast, we parameterize a multivariate autoregressive generative model trained without a fixed variable ordering. By sampling random generation orders during training - a form of information-preserving dropout - the model is encouraged to be invariant to variable order, promoting search-space diversity and shaping the model to focus on the most relevant variable dependencies, improving sample efficiency. We adapt Generalized Reinforcement Policy Optimization (GRPO) to this setting, providing stable policy-gradient updates from scale-invariant advantages. Across a wide range of benchmark algorithms and problem instances of varying sizes, our method frequently achieves the best performance and consistently avoids catastrophic failures.