🤖 AI Summary
This paper addresses the dual challenge of adversarial contamination (with ε-fraction outliers) and distributional shift in training data. We propose the first unified framework integrating distributionally robust optimization (DRO) and contamination-robust learning. Our method constructs a robust objective based on the Wasserstein-1 distance and incorporates principles from robust statistics to design an efficient convex optimization algorithm, applicable to generalized linear models and convex Lipschitz loss functions. Theoretically, we establish the first $O(sqrt{varepsilon})$ estimation error bound for the true DRO objective under ε-contamination alone—surpassing existing robustness analysis limitations—while guaranteeing polynomial-time solvability and rigorous statistical consistency. Empirical results demonstrate that our framework significantly enhances model stability and generalization accuracy under compound uncertainties.
📝 Abstract
Distributionally Robust Optimization (DRO) provides a framework for decision-making under distributional uncertainty, yet its effectiveness can be compromised by outliers in the training data. This paper introduces a principled approach to simultaneously address both challenges. We focus on optimizing Wasserstein-1 DRO objectives for generalized linear models with convex Lipschitz loss functions, where an $ε$-fraction of the training data is adversarially corrupted. Our primary contribution lies in a novel modeling framework that integrates robustness against training data contamination with robustness against distributional shifts, alongside an efficient algorithm inspired by robust statistics to solve the resulting optimization problem. We prove that our method achieves an estimation error of $O(sqrtε)$ for the true DRO objective value using only the contaminated data under the bounded covariance assumption. This work establishes the first rigorous guarantees, supported by efficient computation, for learning under the dual challenges of data contamination and distributional shifts.