Distributionally Robust Coreset Selection under Covariate Shift

📅 2025-01-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the practical challenge of unknown and shifted covariate distributions during deployment, this paper proposes a distributionally robust coreset selection method. It models worst-case distributional shifts within a Wasserstein ball, theoretically derives an upper bound on generalization error, and optimizes the selection of a representative subset by minimizing this bound. This work is the first to integrate distributionally robust optimization into coreset selection, explicitly ensuring robustness of training data selection against covariate shift. By leveraging convex optimization analysis and gradient approximation techniques, the method scales effectively to deep learning models. Empirical evaluation across multiple covariate shift benchmarks demonstrates substantial improvements over conventional approaches: under severe shift conditions, average test accuracy increases by 5.2%, significantly enhancing out-of-distribution generalization capability.

Technology Category

Application Category

📝 Abstract
Coreset selection, which involves selecting a small subset from an existing training dataset, is an approach to reducing training data, and various approaches have been proposed for this method. In practical situations where these methods are employed, it is often the case that the data distributions differ between the development phase and the deployment phase, with the latter being unknown. Thus, it is challenging to select an effective subset of training data that performs well across all deployment scenarios. We therefore propose Distributionally Robust Coreset Selection (DRCS). DRCS theoretically derives an estimate of the upper bound for the worst-case test error, assuming that the future covariate distribution may deviate within a defined range from the training distribution. Furthermore, by selecting instances in a way that suppresses the estimate of the upper bound for the worst-case test error, DRCS achieves distributionally robust training instance selection. This study is primarily applicable to convex training computation, but we demonstrate that it can also be applied to deep learning under appropriate approximations. In this paper, we focus on covariate shift, a type of data distribution shift, and demonstrate the effectiveness of DRCS through experiments.
Problem

Research questions and friction points this paper is trying to address.

Robust Selection
Representative Dataset
Feature Change
Innovation

Methods, ideas, or system contributions that make the work stand out.

Distributionally Robust Core Set Selection
Adaptive Learning
Worst-case Error Rate
🔎 Similar Papers
No similar papers found.
T
Tomonari Tanaka
Nagoya University
Hiroyuki Hanada
Hiroyuki Hanada
Nagoya University
machine learningmathematical optimization
H
Hanting Yang
Nagoya University
Tatsuya Aoyama
Tatsuya Aoyama
Research Scientist, Meta
language modelingpretraining dynamicsinterpretabilitycognitive science
Y
Yu Inatsu
Nagoya Institute of Technology
S
Satoshi Akahane
Nagoya University
Y
Yoshito Okura
Nagoya University
N
Noriaki Hashimoto
RIKEN
T
Taro Murayama
DENSO CORPORATION
H
Hanju Lee
DENSO CORPORATION
S
Shinya Kojima
DENSO CORPORATION
I
Ichiro Takeuchi
Nagoya University & RIKEN