🤖 AI Summary
This study addresses the critical role of uncertainty quantification in factor extraction for high-dimensional systems, where accurate interpretability and information aggregation hinge on reliable estimation. It presents the first systematic comparison of Principal Component (PC) analysis and Kalman Filtering (KF) in terms of mean squared error (MSE) for factor estimation under finite samples, explicitly examining how misspecification of the cross-sectional correlation structure of idiosyncratic components—such as assuming homoskedasticity or zero correlation—affects estimation accuracy. Leveraging linear projection theory and Monte Carlo simulations, the work demonstrates that treating true factors as random variables rather than fixed significantly reduces MSE. Across multiple settings, KF consistently outperforms PC, yielding more precise factor estimates. These findings establish a theoretical foundation for constructing factor confidence intervals, with simulation results confirming their empirical validity.
📝 Abstract
Factor extraction from systems of variables with a large cross-sectional dimension, $N$, is often based on either Principal Components (PC)-based procedures, or Kalman filter (KF)-based procedures. Measuring the uncertainty of the extracted factors is important when, for example, they have a direct interpretation and/or they are used to summarized the information in a large number of potential predictors. In this paper, we compare the finite $N$ mean square errors (MSEs) of PC and KF factors extracted under different structures of the idiosyncratic cross-correlations. We show that the MSEs of PC-based factors, implicitly based on treating the true underlying factors as deterministic, are larger than the corresponding MSEs of KF factors, obtained by treating the true factors as either serially independent or autocorrelated random variables. We also study and compare the MSEs of PC and KF factors estimated when the idiosyncratic components are wrongly considered as if they were cross-sectionally homoscedastic and/or uncorrelated. The relevance of the results for the construction of confidence intervals for the factors are illustrated with simulated data.