Unsupervised Domain Adaptation with an Unobservable Source Subpopulation

📅 2025-09-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses unsupervised domain adaptation under unobserved subgroup structure in the source domain: the source is partitioned into four subgroups by binary label $Y$ and binary background variable $A$, with the subgroup $A=1,Y=1$ entirely missing. Ignoring this missingness induces prediction bias on the target domain. To address this, we propose a background-specific–global joint modeling framework—the first method enabling recoverable target-domain prediction under partial subgroup unobservability—accompanied by an upper bound on estimation error and asymptotic consistency guarantees. Our approach estimates latent subgroup proportions via distribution matching, without imposing strong assumptions on the missingness mechanism. Experiments on synthetic data and real-world medical and image datasets demonstrate that our method significantly outperforms naive baselines that ignore the missing structure, achieving higher accuracy and improved robustness on the target domain.

Technology Category

Application Category

📝 Abstract
We study an unsupervised domain adaptation problem where the source domain consists of subpopulations defined by the binary label $Y$ and a binary background (or environment) $A$. We focus on a challenging setting in which one such subpopulation in the source domain is unobservable. Naively ignoring this unobserved group can result in biased estimates and degraded predictive performance. Despite this structured missingness, we show that the prediction in the target domain can still be recovered. Specifically, we rigorously derive both background-specific and overall prediction models for the target domain. For practical implementation, we propose the distribution matching method to estimate the subpopulation proportions. We provide theoretical guarantees for the asymptotic behavior of our estimator, and establish an upper bound on the prediction error. Experiments on both synthetic and real-world datasets show that our method outperforms the naive benchmark that does not account for this unobservable source subpopulation.
Problem

Research questions and friction points this paper is trying to address.

Addresses unsupervised domain adaptation with missing source subpopulation data
Recovers target domain predictions despite unobservable source subgroups
Proposes distribution matching to estimate missing subpopulation proportions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Handles unobservable source subpopulation via distribution matching
Derives background-specific and overall prediction models
Provides theoretical guarantees and error bounds
🔎 Similar Papers
No similar papers found.
C
Chao Ying
University of Wisconsin-Madison
J
Jun Jin
Michigan State University
H
Haotian Zhang
University of Connecticut
Qinglong Tian
Qinglong Tian
University of Waterloo
statistics
Y
Yanyuan Ma
Pennsylvania State University
Y
Yixuan Li
University of Wisconsin-Madison
Jiwei Zhao
Jiwei Zhao
University of Wisconsin-Madison
StatisticsMachine LearningData ScienceBiostatisticsBiomedical Data Science