๐ค AI Summary
This work addresses the problem of data sufficiency for linear optimization under cost vector uncertainty: identifying the minimal dataset that uniquely determines the optimal decision. Methodologically, it introduces the first geometric sufficiency criterion for linear programming, grounded in convex geometry and duality theory, to characterize the critical cost directions governing optimality; it further establishes a modeling framework for uncertainty sets and designs a task-driven data selection algorithm. Theoretically, it proves the existence of a small-scale, structured minimal cost dataset sufficient to fully recover the optimal solution. This work provides rigorous theoretical guarantees and an efficient constructive procedure for task-aware data acquisition, overcoming key limitations of conventional sufficiency analysesโnamely, their reliance on statistical assumptions or large-sample requirements.
๐ Abstract
We study the fundamental question of how informative a dataset is for solving a given decision-making task. In our setting, the dataset provides partial information about unknown parameters that influence task outcomes. Focusing on linear programs, we characterize when a dataset is sufficient to recover an optimal decision, given an uncertainty set on the cost vector. Our main contribution is a sharp geometric characterization that identifies the directions of the cost vector that matter for optimality, relative to the task constraints and uncertainty set. We further develop a practical algorithm that, for a given task, constructs a minimal or least-costly sufficient dataset. Our results reveal that small, well-chosen datasets can often fully determine optimal decisions -- offering a principled foundation for task-aware data selection.