🤖 AI Summary
This study systematically evaluates the effectiveness of automated optimization methods in enhancing the performance of Rocq-based formal theorem-proving agents. Focusing on key components such as prompt design, contextual knowledge, and control strategies, we apply a range of automated optimizers—including few-shot prompting—to perform end-to-end tuning of proof-generation agents. To our knowledge, this work constitutes the first comprehensive empirical investigation of automated agent optimization within a real-world formal proving environment. Our results demonstrate that certain approaches, particularly few-shot prompting, yield measurable and consistent performance gains. Nevertheless, all automated optimization strategies still fall short of the current best hand-crafted agent designs, revealing a persistent performance gap between automated tuning and expert manual refinement.
📝 Abstract
This work studies the applicability of automatic AI agent optimization methods to real-world agents in formal verification settings, focusing on automated theorem proving in Rocq as a representative and challenging domain. We evaluate how different automatic agent optimizers perform when applied to the task of optimizing a Rocq proof-generation agent, and assess whether parts of the fine-grained tuning of agentic systems, such as prompt design, contextual knowledge, and control strategies, can be automated. Our results show that while several optimizers yield measurable improvements, simple few-shot bootstrapping is the most consistently effective; however, none of the studied methods matches the performance of a carefully engineered state-of-the-art proof agent.