🤖 AI Summary
This study addresses the tension between accelerated model discovery and reproducibility in empirical economics when using AI coding agents. While such agents can expedite model search, they often introduce hidden degrees of freedom that undermine result robustness and replicability. To mitigate this, the paper introduces— for the first time in this domain—an auditable, open-source AI agent loop that enforces a transparent workflow. This framework incorporates rolling evaluation, holdout-set validation, and comprehensive logging to trace the entire adaptive model selection process, thereby distinguishing spurious, sample-specific findings from genuinely robust improvements. Experimental results across multiple rounds demonstrate that the agent consistently outperforms benchmark methods in rolling evaluations but exhibits variable performance on holdout sets, underscoring the critical role of transparent validation in identifying reliable empirical results.
📝 Abstract
AI coding agents make empirical specification search fast and cheap, but they also widen hidden researcher degrees of freedom. Building on an open-source agent-loop architecture, this paper recasts a minimal coding loop as a transparent protocol for empirical economics. In a forecast-combination illustration, multiple independent agent runs outperform standard benchmarks in the original rolling evaluation, but not all continue to do so on a post-search holdout. Logged search and holdout evaluation together make adaptive specification search visible and help distinguish robust improvements from sample-specific discoveries.