🤖 AI Summary
This work investigates the average-case computational complexity of sparse linear regression (SLR), focusing on whether polynomial-time algorithms exist for ill-conditioned design matrices—e.g., those with low rank or high correlation. The authors establish the first rigorous, instance-level reduction from classical worst-case lattice problems—specifically Bounded Distance Decoding (BDD)—to SLR. Their framework directly links the condition number of the lattice problem to the restricted eigenvalue condition of the SLR design matrix. This reduction holds in both identifiable and unidentifiable regimes. Leveraging worst-case-to-average-case hardness amplification, they prove that if BDD is hard in the worst case, then SLR remains computationally intractable on average for all polynomial-time algorithms. The result bridges a fundamental gap at the intersection of high-dimensional sparse statistics and computational complexity theory, providing the first evidence of average-case hardness for SLR under realistic design matrix conditions.
📝 Abstract
Sparse linear regression (SLR) is a well-studied problem in statistics where one is given a design matrix $Xinmathbb{R}^{m imes n}$ and a response vector $y=X heta^*+w$ for a $k$-sparse vector $ heta^*$ (that is, $| heta^*|_0leq k$) and small, arbitrary noise $w$, and the goal is to find a $k$-sparse $widehat{ heta} in mathbb{R}^n$ that minimizes the mean squared prediction error $frac{1}{m}|Xwidehat{ heta}-X heta^*|^2_2$. While $ell_1$-relaxation methods such as basis pursuit, Lasso, and the Dantzig selector solve SLR when the design matrix is well-conditioned, no general algorithm is known, nor is there any formal evidence of hardness in an average-case setting with respect to all efficient algorithms. We give evidence of average-case hardness of SLR w.r.t. all efficient algorithms assuming the worst-case hardness of lattice problems. Specifically, we give an instance-by-instance reduction from a variant of the bounded distance decoding (BDD) problem on lattices to SLR, where the condition number of the lattice basis that defines the BDD instance is directly related to the restricted eigenvalue condition of the design matrix, which characterizes some of the classical statistical-computational gaps for sparse linear regression. Also, by appealing to worst-case to average-case reductions from the world of lattices, this shows hardness for a distribution of SLR instances; while the design matrices are ill-conditioned, the resulting SLR instances are in the identifiable regime. Furthermore, for well-conditioned (essentially) isotropic Gaussian design matrices, where Lasso is known to behave well in the identifiable regime, we show hardness of outputting any good solution in the unidentifiable regime where there are many solutions, assuming the worst-case hardness of standard and well-studied lattice problems.