🤖 AI Summary
Non-binary integer variables in mixed-integer linear programming (MILP) impede end-to-end learning methods from guaranteeing solution feasibility—a fundamental challenge unaddressed by existing learned solvers.
Method: We propose the first solver-free, reinforcement learning–driven framework for MILP, built upon Proximal Policy Optimization (PPO) and graph neural networks (GNNs). Our end-to-end architecture jointly models constraint satisfaction for both binary and non-binary integer variables. We introduce a novel branch-and-assignment joint action space and a self-supervised feasibility reward mechanism to enable progressive optimization—from the first feasible solution to near-optimal solutions.
Results: On standard benchmarks, our method achieves 100% feasibility rate, attains an average objective value at 99.2% of the optimal, and solves problems 3.8× faster than traditional branch-and-bound—outperforming all prior learning-based solvers in both solution quality and efficiency.
📝 Abstract
Mixed-integer linear programming (MILP) is a widely used optimization technique across various fields. Existing $ extit{end-to-end learning}$ methods for MILP generate values for a subset of decision variables and delegate the remaining problem to traditional MILP solvers. However, this approach often fails to guarantee solution feasibility (i.e., satisfying all constraints) due to inaccurate predictions and primarily focuses on binary decision variables. Satisfying all constraints is a prerequisite for obtaining the optimal solution, and the feasibility issue becomes even more critical with non-binary integer (integer, for short) variables. Thus, addressing the feasibility of MILP involving integer variables is crucial. To address these challenges, we propose a novel reinforcement learning (RL)-based solver that not only finds the first feasible solution but also incrementally discovers better feasible solutions without delegating the remainder to off-the-shelf solvers. Our experimental results demonstrate that the proposed method achieves (near-)optimal solutions.