🤖 AI Summary
In high-dimensional panel data (with both large $N$ and $T$), existing frameworks lack a comparable inferential basis for assessing predictive accuracy differences between pooled and individual estimators. This paper develops the first inference method for the prediction error difference that accommodates $N gg T$ and general spatiotemporal dependence in the errors. It constructs asymptotically valid confidence intervals, relaxing classical i.i.d. and low-dimensional assumptions. The method integrates double-robust inference, adaptive cross-sectional aggregation, and long-memory-robust variance estimation. Theoretical validity is established under mild conditions—including $N/T^2 o 0$—and Monte Carlo simulations demonstrate substantially improved finite-sample coverage and precision over state-of-the-art alternatives. The core contribution is the first statistically rigorous quantification of predictive error differences between pooled and individual estimators in high-dimensional panels, thereby providing a formal foundation for model selection.
📝 Abstract
Panels with large time $(T)$ and cross-sectional $(N)$ dimensions are a key data structure in social sciences and other fields. A central question in panel data analysis is whether to pool data across individuals or to estimate separate models. Pooled estimators typically have lower variance but may suffer from bias, creating a fundamental trade-off for optimal estimation. We develop a new inference method to compare the forecasting performance of pooled and individual estimators. Specifically, we propose a confidence interval for the difference between their forecasting errors and establish its asymptotic validity. Our theory allows for complex temporal and cross-sectional dependence in the model errors and covers scenarios where $N$ can be much larger than $T$-including the independent case under the classical condition $N/T^2 o 0$. The finite-sample properties of the proposed method are examined in an extensive simulation study.