🤖 AI Summary
To address the exponential growth in model size and computational intractability when embedding deep neural networks (DNNs) into mixed-integer programming (MIP), this paper proposes a modular solution framework based on dual decomposition and the augmented Lagrangian method. The original problem is decomposed into two subproblems: an MIP subproblem involving only integer variables—whose count remains constant regardless of network depth—and a constrained DNN subproblem solved via first-order optimization. This design ensures that per-iteration computational cost scales linearly with network size. The MIP subproblem is tackled using branch-and-bound, while the DNN subproblem supports arbitrary architectures (e.g., LSTM) and plug-and-play optimizers. On the SurrogateLIB benchmark, our method solves the largest instances up to 120× faster than exact Big-M formulations. Solver substitution for the DNN subproblem requires no code modification and yields identical objective values. Notably, end-to-end LSTM architecture optimization completes within 47 seconds.
📝 Abstract
Embedding deep neural networks (NNs) into mixed-integer programs (MIPs) is attractive for decision making with learned constraints, yet state-of-the-art monolithic linearisations blow up in size and quickly become intractable. In this paper, we introduce a novel dual-decomposition framework that relaxes the single coupling equality u=x with an augmented Lagrange multiplier and splits the problem into a vanilla MIP and a constrained NN block. Each part is tackled by the solver that suits it best-branch and cut for the MIP subproblem, first-order optimisation for the NN subproblem-so the model remains modular, the number of integer variables never grows with network depth, and the per-iteration cost scales only linearly with the NN size. On the public extsc{SurrogateLIB} benchmark, our method proves extbf{scalable}, extbf{modular}, and extbf{adaptable}: it runs (120 imes) faster than an exact Big-M formulation on the largest test case; the NN sub-solver can be swapped from a log-barrier interior step to a projected-gradient routine with no code changes and identical objective value; and swapping the MLP for an LSTM backbone still completes the full optimisation in 47s without any bespoke adaptation.