Double Machine Learning for Static Panel Models with Fixed Effects

📅 2023-12-13
🏛️ arXiv.org
📈 Citations: 1
Influential: 1
📄 PDF
🤖 AI Summary
This paper addresses causal inference in static panel data with fixed effects under high-dimensional nonlinear confounding. We propose the first double machine learning (DML) framework tailored to the Robinson partial linear model for such settings. Our method unifies within-group estimation, first-differencing, and correlated random effects estimation without imposing distributional assumptions on fixed effects. It employs doubly robust debiasing combined with ensemble machine learning—specifically Lasso, random forests, and XGBoost—to jointly estimate nonlinear nuisance components. Simulation studies demonstrate substantial improvements in estimation accuracy and finite-sample bias reduction. In an empirical reanalysis of the effect of the UK minimum wage on voting behavior, the “first-differencing + ensemble” variant achieves optimal performance, reducing bias by over 40% relative to conventional approaches. The framework thus delivers a more robust and flexible tool for policy-relevant causal evaluation in high-dimensional nonlinear panel settings.
📝 Abstract
Recent advances in causal inference have seen the development of methods which make use of the predictive power of machine learning algorithms. In this paper, we develop novel double machine learning (DML) procedures for panel data in which these algorithms are used to approximate high-dimensional and nonlinear nuisance functions of the covariates. Our new procedures are extensions of the well-known correlated random effects, within-group and first-difference estimators from linear to nonlinear panel models, specifically, Robinson (1988)'s partially linear regression model with fixed effects and unspecified nonlinear confounding. Our simulation study assesses the performance of these procedures using different machine learning algorithms. We use our procedures to re-estimate the impact of minimum wage on voting behaviour in the UK. From our results, we recommend the use of first-differencing because it imposes the fewest constraints on the distribution of the fixed effects, and an ensemble learning strategy to ensure optimum estimator accuracy.
Problem

Research questions and friction points this paper is trying to address.

Causal Inference
Machine Learning
Panel Data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Panel Data Learning
Integrated Learning Strategy
First Difference Method
🔎 Similar Papers
No similar papers found.
P
Paul S. Clarke
Institute for Social and Economic Research, University of Essex, Colchester CO4 3SQ, UK
A
Annalivia Polselli
Institute for Analytics and Data Science, University of Essex, Colchester CO4 3ZL, UK