🤖 AI Summary
This work proposes a novel “learning everywhere” paradigm that addresses the limitation of conventional AI training, which minimizes average loss but fails to guarantee constraint satisfaction for all individual samples. By introducing pointwise constraints into model training, the method enforces loss bounds almost surely, thereby ensuring individual fairness. Leveraging a newly established approximate duality theory, the approach reveals that dual variables implicitly reweight the data distribution to emphasize regions where constraints are hardest to satisfy, while an L1 sparsity penalty is employed to control generalization error. Theoretically, generalization performance is shown to depend on the discrepancy between the data distribution and the concentration of constraint-violating regions. Empirical validation on proxy classification tasks with language models demonstrates the effectiveness of the proposed framework.
📝 Abstract
Everywhere learning is a new paradigm whereby Artificial Intelligence (AI) systems are trained to satisfy loss constraints with probability one over the data distribution. This is in contrast to the standard paradigm of training AI systems to minimize average losses. We develop an approximate duality theory to substantiate a generalization analysis that establishes the proximity between solutions of empirical and statistical everywhere learning problems. Our results show that dual variables reweigh the data distribution towards points in which loss constraints are more difficult to satisfy and that generalization is controlled by the mismatch between the concentration of mass of the data distribution and the concentration of mass on points where constraints are more difficult to satisfy. We further show that we can control generalization with a sparse L1 penalty on constraint relaxations. We illustrate the merits of everywhere learning with an experiment in agentic classification for language model tasks.