Consistent model selection in the spiked Wigner model via AIC-type criteria

📅 2023-07-24
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses consistent estimation of the signal rank (i.e., the number $k$ of spikes) in the high-dimensional spiked Wigner model. To overcome the inconsistency of classical AIC under high dimensionality, we propose a corrected AIC-type model selection criterion and establish, for the first time, its strong/weak consistency phase transition theory: strong consistency holds if the signal-to-noise ratio $gamma > 2$, inevitable overestimation occurs if $gamma < 2$, and weak consistency is achieved when $gamma = 2 + delta_N$ with $delta_N o 0$ and $delta_N gg N^{-2/3}$. Furthermore, we design a soft-minimization AIC strategy that breaks the classical AIC limitation and achieves strong consistency across the entire parameter regime. Our approach integrates maximum likelihood estimation, random matrix theory (BBP phase transition), GOE asymptotics, and generalized Wigner modeling. The theoretical framework is successfully extended to the sparse stochastic block model for unbiased estimation of the number of communities.
📝 Abstract
Consider the spiked Wigner model [ X = sum_{i = 1}^k lambda_i u_i u_i^ op + sigma G, ] where $G$ is an $N imes N$ GOE random matrix, and the eigenvalues $lambda_i$ are all spiked, i.e. above the Baik-Ben Arous-P'ech'e (BBP) threshold $sigma$. We consider AIC-type model selection criteria of the form [ -2 , ( ext{maximised log-likelihood}) + gamma , ( ext{number of parameters}) ] for estimating the number $k$ of spikes. For $gamma>2$, the above criterion is strongly consistent provided $lambda_k>lambda_{gamma}$, where $lambda_{gamma}$ is a threshold strictly above the BBP threshold, whereas for $gamma<2$, it almost surely overestimates $k$. Although AIC (which corresponds to $gamma = 2$) is not strongly consistent, we show that taking $gamma = 2 + delta_N$, where $delta_N o 0$ and $delta_N gg N^{-2/3}$, results in a weakly consistent estimator of $k$. We further show that a soft minimiser of AIC, where one chooses the least complex model whose AIC score is close to the minimum AIC score, is strongly consistent. Based on a spiked (generalised) Wigner representation, we also develop similar model selection criteria for consistently estimating the number of communities in a balanced stochastic block model under some sparsity restrictions.
Problem

Research questions and friction points this paper is trying to address.

Estimating the number of spikes in the Wigner model.
Developing AIC-type criteria for model selection.
Consistent estimation of communities in block models.
Innovation

Methods, ideas, or system contributions that make the work stand out.

AIC-type criteria for model selection
Strong consistency with specific thresholds
Soft minimiser approach for AIC
🔎 Similar Papers
No similar papers found.