Semiparametric KSD test: unifying score and distance-based approaches for goodness-of-fit testing

📅 2025-12-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
For models with intractable likelihoods but tractable normalizing constants, existing goodness-of-fit tests lack theoretical guarantees or computational feasibility. Method: We propose the semiparametric kernelized Stein discrepancy (SKSD) test, unifying score-based and distance-based frameworks. SKSD leverages exponential tilting models, integral probability metrics, a kernelized Stein function class, the Stein identity, and parametric bootstrap. Contribution/Results: SKSD is the first test proven to be universally consistent, Pitman-optimal, and robust to nuisance parameters. Crucially, it reveals that classical distance-based tests—including Kolmogorov–Smirnov, Wasserstein-1, and MMD—are special cases of score-based constructions under specific Stein operators. Empirically, SKSD achieves competitive power against specialized normality tests (e.g., Anderson–Darling, Lilliefors) on kernel exponential families and conditional Gaussian models. It provides the first general-purpose goodness-of-fit test for complex latent-variable models that simultaneously satisfies strong theoretical guarantees and practical computability.

Technology Category

Application Category

📝 Abstract
Goodness-of-fit (GoF) tests are fundamental for assessing model adequacy. Score-based tests are appealing because they require fitting the model only once under the null. However, extending them to powerful nonparametric alternatives is difficult due to the lack of suitable score functions. Through a class of exponentially tilted models, we show that the resulting score-based GoF tests are equivalent to the tests based on integral probability metrics (IPMs) indexed by a function class. When the class is rich, the test is universally consistent. This simple yet insightful perspective enables reinterpretation of classical distance-based testing procedures-including those based on Kolmogorov-Smirnov distance, Wasserstein-1 distance, and maximum mean discrepancy-as arising from score-based constructions. Building on this insight, we propose a new nonparametric score-based GoF test through a special class of IPM induced by kernelized Stein's function class, called semiparametric kernelized Stein discrepancy (SKSD) test. Compared with other nonparametric score-based tests, the SKSD test is computationally efficient and accommodates general nuisance-parameter estimators, supported by a generic parametric bootstrap procedure. The SKSD test is universally consistent and attains Pitman efficiency. Moreover, SKSD test provides simple GoF tests for models with intractable likelihoods but tractable scores with the help of Stein's identity and we use two popular models, kernel exponential family and conditional Gaussian models, to illustrate the power of our method. Our method achieves power comparable to task-specific normality tests such as Anderson-Darling and Lilliefors, despite being designed for general nonparametric alternatives.
Problem

Research questions and friction points this paper is trying to address.

Unifies score and distance-based goodness-of-fit testing methods
Proposes a computationally efficient nonparametric test for models with intractable likelihoods
Achieves universal consistency and Pitman efficiency for general alternatives
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unifies score-based and distance-based goodness-of-fit testing methods
Introduces semiparametric kernelized Stein discrepancy test for efficiency
Accommodates models with intractable likelihoods using Stein's identity
Z
Zhihan Huang
Department of Statistics and Data Science, University of Pennsylvania
Ziang Niu
Ziang Niu
PhD student at Wharton Statistics and Data Science Department
Hypothesis TestingGenomics