Robust and Sparse Generalized Linear Models for High-Dimensional Data via Maximum Mean Discrepancy

šŸ“… 2026-02-24
šŸ“ˆ Citations: 0
✨ Influential: 0
šŸ“„ PDF
šŸ¤– AI Summary
This work addresses the challenge of high-dimensional generalized linear models (GLMs) under heavy-tailed noise or outliers, where conventional ℓ₁-regularized methods like Lasso suffer from estimation bias and poor robustness in variable selection. The authors propose a novel ℓ₁-regularized estimator that integrates maximum mean discrepancy (MMD) into the high-dimensional GLM framework, offering both statistical robustness and computational efficiency. The method features an exact O(n²) formulation and a scalable O(n) approximation, with its non-convex optimization problem solved via a combination of ADMM and AdaGrad. Theoretical analysis and extensive experiments demonstrate that the proposed approach significantly outperforms existing regularized and robust methods in Gaussian linear and logistic regression settings, particularly excelling under heavy-tailed errors and high-leverage contamination.

Technology Category

Application Category

šŸ“ Abstract
High-dimensional datasets are frequently subject to contamination by outliers and heavy-tailed noise, which can severely bias standard regularized estimators like the Lasso. While Maximum Mean Discrepancy (MMD) has recently been introduced as a"universal"framework for robust regression, its application to high-dimensional Generalized Linear Models (GLMs) remains largely unexplored, particularly regarding variable selection. In this paper, we propose a penalized MMD framework for robust estimation and feature selection in GLMs. We introduce an $\ell_1$-penalized MMD objective and develop two versions of the estimator: a full $O(n^2)$ version and a computationally efficient $O(n)$ approximation. To solve the resulting non-convex optimization problem, we employ an algorithm based on the Alternating Direction Method of Multipliers (ADMM) combined with AdaGrad. Through extensive simulation studies involving Gaussian linear regression and binary logistic regression, we demonstrate that our proposed methods significantly outperform classical penalized GLMs and existing robust benchmarks. Our approach shows particular strength in handling high-leverage points and heavy-tailed error distributions, where traditional methods often fail.
Problem

Research questions and friction points this paper is trying to address.

high-dimensional data
outliers
heavy-tailed noise
Generalized Linear Models
robust estimation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Maximum Mean Discrepancy
Robust GLM
Sparse Estimation
ADMM
High-dimensional Data
šŸ”Ž Similar Papers
No similar papers found.
X
Xiaoning Kang
Institute of Supply Chain Analytics and International Business College, Dongbei University of Finance and Economics, 217 Jianshan Street, Dalian, 116025, Liaoning, China
Lulu Kang
Lulu Kang
Associate Professor of University of Massachusetts Amherst
statistical design of experimentsstatistical learninguncertainty quantificationoperations