🤖 AI Summary
Existing mixed-integer linear programming (MILP) instances lack comparable structural representations, and mainstream similarity measures either rely on labeled data or suffer from insufficient accuracy and poor generalizability.
Method: We propose the first purely mathematically grounded, label-free distance metric for MILP instances: constraints’ right-hand sides, coefficients, and variables are discretized into empirical distributions, and the Earth Mover’s Distance (EMD) is employed to quantify distributional divergence. The method is both interpretable and efficient—our greedy EMD approximation achieves near-optimal accuracy while accelerating computation by approximately 200×.
Contribution/Results: Experiments on StrIPLIB demonstrate that our metric significantly outperforms unsupervised baselines in clustering tasks and matches the performance of supervised classification models, validating its effectiveness and practicality in capturing intrinsic MILP instance structure.
📝 Abstract
Mixed-integer linear programming (MILP) is a powerful tool for addressing a wide range of real-world problems, but it lacks a clear structure for comparing instances. A reliable similarity metric could establish meaningful relationships between instances, enabling more effective evaluation of instance set heterogeneity and providing better guidance to solvers, particularly when machine learning is involved. Existing similarity metrics often lack precision in identifying instance classes or rely heavily on labeled data, which limits their applicability and generalization. To bridge this gap, this paper introduces the first mathematical distance metric for MILP instances, derived directly from their mathematical formulations. By discretizing right-hand sides, weights, and variables into classes, the proposed metric draws inspiration from the Earth mover's distance to quantify mismatches in weight-variable distributions for constraint comparisons. This approach naturally extends to enable instance-level comparisons. We evaluate both an exact and a greedy variant of our metric under various parameter settings, using the StrIPLIB dataset. Results show that all components of the metric contribute to class identification, and that the greedy version achieves accuracy nearly identical to the exact formulation while being nearly 200 times faster. Compared to state-of-the-art baselines, including feature-based, image-based, and neural network models, our unsupervised method consistently outperforms all non-learned approaches and rivals the performance of a supervised classifier on class and subclass grouping tasks.