🤖 AI Summary
This study addresses the challenge of testing equality of conditional distributions in settings with high-dimensional covariates and multivariate responses. The authors propose a cross-generative alignment approach that circumvents direct estimation of conditional density ratios by training two conditional generators and cross-generating responses at each other’s covariate values, thereby enabling direct comparison between generated and observed samples. A test statistic is constructed via an indexed empirical process in a reproducing kernel Hilbert space (RKHS), with inference carried out using a multiplier bootstrap. The method is theoretically shown to be consistent under both the null and alternative hypotheses, to possess a well-characterized limiting distribution, and to admit valid bootstrap approximation. Empirical results demonstrate its superior performance in high-dimensional regimes over existing methods, exhibiting both double robustness and strong adaptability to limited covariate overlap.
📝 Abstract
We study the problem of testing whether two conditional distributions are equal using generative models. The proposed method learns a conditional generator from each sample and uses it to create responses at covariate values observed in the other sample, allowing generated and observed responses to be compared directly. By aligning covariates through cross-generation, the approach avoids conditional density-ratio estimation and local smoothing over high-dimensional covariates. The population version of this construction yields a conditional discrepancy that characterizes equality of the two conditional distributions under suitable overlap conditions, while the sample version leads to a test statistic defined as the supremum of an RKHS-indexed empirical process with multiplier bootstrap calibration. A computationally efficient algorithm for evaluating the statistic and its bootstrap analogue is developed based on alternating maximization and the kernel trick. Theoretically, we derive the limiting distribution of the test statistic under both the null and alternative hypotheses, prove bootstrap validity and consistency of the resulting test, and show that the proposed procedure attains a double-robustness property with respect to conditional generator estimation errors. Simulations and real data applications suggest that the proposed method performs well for multivariate responses and high-dimensional covariates.