Unsupervised Domain Adaptation with an Unobservable Source Subpopulation

📅 2025-09-24

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

This paper addresses unsupervised domain adaptation under unobserved subgroup structure in the source domain: the source is partitioned into four subgroups by binary label $Y$ and binary background variable $A$, with the subgroup $A=1,Y=1$ entirely missing. Ignoring this missingness induces prediction bias on the target domain. To address this, we propose a background-specific–global joint modeling framework—the first method enabling recoverable target-domain prediction under partial subgroup unobservability—accompanied by an upper bound on estimation error and asymptotic consistency guarantees. Our approach estimates latent subgroup proportions via distribution matching, without imposing strong assumptions on the missingness mechanism. Experiments on synthetic data and real-world medical and image datasets demonstrate that our method significantly outperforms naive baselines that ignore the missing structure, achieving higher accuracy and improved robustness on the target domain.

Technology Category

Application Category

📝 Abstract

We study an unsupervised domain adaptation problem where the source domain consists of subpopulations defined by the binary label $Y$ and a binary background (or environment) $A$. We focus on a challenging setting in which one such subpopulation in the source domain is unobservable. Naively ignoring this unobserved group can result in biased estimates and degraded predictive performance. Despite this structured missingness, we show that the prediction in the target domain can still be recovered. Specifically, we rigorously derive both background-specific and overall prediction models for the target domain. For practical implementation, we propose the distribution matching method to estimate the subpopulation proportions. We provide theoretical guarantees for the asymptotic behavior of our estimator, and establish an upper bound on the prediction error. Experiments on both synthetic and real-world datasets show that our method outperforms the naive benchmark that does not account for this unobservable source subpopulation.

Problem

Research questions and friction points this paper is trying to address.

Addresses unsupervised domain adaptation with missing source subpopulation data

Recovers target domain predictions despite unobservable source subgroups

Proposes distribution matching to estimate missing subpopulation proportions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Handles unobservable source subpopulation via distribution matching

Derives background-specific and overall prediction models

Provides theoretical guarantees and error bounds

🔎 Similar Papers

No similar papers found.