๐ค AI Summary
This work addresses the challenge of actively designing training data distributions to improve average out-of-distribution (OOD) generalization. Motivated by the severe performance degradation of standard empirical risk minimization under distribution shift, we establish the first provable theoretical connection between training distribution selection and OOD error. We propose two novel paradigms: (i) a bilevel optimization framework formulated in the space of probability measures, and (ii) distribution-aware learning explicitly minimizing a theoretically derived upper bound on OOD risk. Through generalization bound analysis, measure-theoretic optimization, and operator learning experiments, we demonstrate consistent improvements across diverse function and operator approximation tasksโachieving 12.3%โ28.7% absolute gains in OOD accuracy over strong baselines. Our results rigorously validate that explicit modeling and optimization of the training distribution are essential for robust generalization.
๐ Abstract
Out-of-distribution (OOD) generalization remains a fundamental challenge in machine learning. Models trained on one data distribution often experience substantial performance degradation when evaluated on shifted or unseen domains. To address this challenge, the present paper studies the design of training data distributions that maximize average-case OOD performance. First, a theoretical analysis establishes a family of generalization bounds that quantify how the choice of training distribution influences OOD error across a predefined family of target distributions. These insights motivate the introduction of two complementary algorithmic strategies: (i) directly formulating OOD risk minimization as a bilevel optimization problem over the space of probability measures and (ii) minimizing a theoretical upper bound on OOD error. Last, the paper evaluates the two approaches across a range of function approximation and operator learning examples. The proposed methods significantly improve OOD accuracy over standard empirical risk minimization with a fixed distribution. These results highlight the potential of distribution-aware training as a principled and practical framework for robust OOD generalization.