🤖 AI Summary
In observational studies, conventional matching methods suffer from bias due to heterogeneity in covariate importance across units. To address this, we propose SCOTOMA—a semi-supervised one-to-one matching framework. SCOTOMA jointly leverages a small set of expert-annotated matched pairs and abundant unlabeled (unmatched) data to learn an interpretable quadratic scoring function that explicitly estimates heterogeneous, covariate-specific weights. We establish theoretical consistency of the weight estimator and design an efficient matching search algorithm integrating consistency regularization and a simulated-annealing-inspired heuristic. Empirical evaluation demonstrates that SCOTOMA significantly outperforms mainstream methods—including Propensity Score Matching (PSM) and Covariate Balancing Propensity Score (CBPS)—on synthetic benchmarks. In a real-world application, SCOTOMA successfully estimated the causal effect of in-person instruction on community-level COVID-19 transmission rates, delivering actionable causal evidence for public health policy.
📝 Abstract
Multivariate matching algorithms "pair" similar study units in an observational study to remove potential bias and confounding effects caused by the absence of randomizations. In one-to-one multivariate matching algorithms, a large number of "pairs" to be matched could mean both the information from a large sample and a large number of tasks, and therefore, to best match the pairs, such a matching algorithm with efficiency and comparatively limited auxiliary matching knowledge provided through a "training" set of paired units by domain experts, is practically intriguing. We proposed a novel one-to-one matching algorithm based on a quadratic score function $S_{eta}(x_i,x_j)= eta^T (x_i-x_j)(x_i-x_j)^T eta$. The weights $eta$, which can be interpreted as a variable importance measure, are designed to minimize the score difference between paired training units while maximizing the score difference between unpaired training units. Further, in the typical but intricate case where the training set is much smaller than the unpaired set, we propose a underline{s}emisupervised underline{c}ompanion underline{o}ne-underline{t}o-underline{o}ne underline{m}atching underline{a}lgorithm (SCOTOMA) that makes the best use of the unpaired units. The proposed weight estimator is proved to be consistent when the truth matching criterion is indeed the quadratic score function. When the model assumptions are violated, we demonstrate that the proposed algorithm still outperforms some popular competing matching algorithms through a series of simulations. We applied the proposed algorithm to a real-world study to investigate the effect of in-person schooling on community Covid-19 transmission rate for policy making purpose.