Efficient Inference under Label Shift in Unsupervised Domain Adaptation

📅 2025-08-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In unsupervised domain adaptation, accurate estimation of target-domain population parameters is challenging when label shift exists between source and target domains. To address this, we propose a three-stage progressive estimation framework based on density-ratio modeling: starting from an initial heuristic guess, proceeding to a consistent estimator, and culminating in efficient and robust population parameter inference. This work introduces, for the first time, a self-evolving estimation mechanism into statistical inference and establishes its theoretical connection to prediction-powered inference (PPI). By integrating flexible machine learning models with rigorous asymptotic analysis, our method achieves statistically grounded guarantees—including consistency and explicit convergence rates—while demonstrating strong practical robustness. Extensive experiments on synthetic and real-world datasets show significant improvements over state-of-the-art baselines.

Technology Category

Application Category

📝 Abstract
In many real-world applications, researchers aim to deploy models trained in a source domain to a target domain, where obtaining labeled data is often expensive, time-consuming, or even infeasible. While most existing literature assumes that the labeled source data and the unlabeled target data follow the same distribution, distribution shifts are common in practice. This paper focuses on label shift and develops efficient inference procedures for general parameters characterizing the unlabeled target population. A central idea is to model the outcome density ratio between the labeled and unlabeled data. To this end, we propose a progressive estimation strategy that unfolds in three stages: an initial heuristic guess, a consistent estimation, and ultimately, an efficient estimation. This self-evolving process is novel in the statistical literature and of independent interest. We also highlight the connection between our approach and prediction-powered inference (PPI), which uses machine learning models to improve statistical inference in related settings. We rigorously establish the asymptotic properties of the proposed estimators and demonstrate their superior performance compared to existing methods. Through simulation studies and multiple real-world applications, we illustrate both the theoretical contributions and practical benefits of our approach.
Problem

Research questions and friction points this paper is trying to address.

Addressing label shift in unsupervised domain adaptation
Developing efficient inference for target population parameters
Modeling outcome density ratio between labeled and unlabeled data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Progressive three-stage estimation strategy
Models outcome density ratio between datasets
Connects with prediction-powered inference methods
🔎 Similar Papers
No similar papers found.
S
Seong-ho Lee
Department of Statistics, University of Seoul, Seoul, South Korea
Y
Yanyuan Ma
Department of Statistics, Pennsylvania State University, PA, USA
Jiwei Zhao
Jiwei Zhao
University of Wisconsin-Madison
StatisticsMachine LearningData ScienceBiostatisticsBiomedical Data Science