🤖 AI Summary
Existing API-level prompt optimization methods typically edit prompts holistically, leading to entangled components, poor attributability, limited controllability, and high token overhead. To address these limitations, this work proposes the adaptive Prompt Structure Factorization (aPSF) framework, which—without requiring access to model internals—employs an Architect model to automatically discover task-relevant semantic prompt structures. aPSF introduces an error-guided factor selection mechanism and factor-level intervention scoring to enable targeted, single-factor updates. The approach substantially improves sample efficiency and controllability, achieving up to a 2.16 percentage point average accuracy gain across multiple advanced reasoning benchmarks. On MultiArith, it reduces optimization token consumption by 45–87% and attains optimal validation performance in a single update step.
📝 Abstract
Automated prompt optimization is crucial for eliciting reliable reasoning from large language models (LLMs), yet most API-only prompt optimizers iteratively edit monolithic prompts, coupling components and obscuring credit assignment, limiting controllability, and wasting tokens. We propose Adaptive Prompt Structure Factorization (aPSF), an API-only framework (prompt-in/text-out; no access to model internals) that uses an Architect model to discover task-specific prompt structures as semantic factors. aPSF then performs interventional, single-factor updates: interventional factor-level scoring estimates each factor's marginal contribution via validation-performance changes, and error-guided factor selection routes updates to the current dominant failure source for more sample-efficient optimization. Across multiple advanced reasoning benchmarks, aPSF outperforms strong baselines including principle-aware optimizers, improving accuracy by up to +2.16 percentage points on average, and reduces optimization cost by 45--87% tokens on MultiArith while reaching peak validation in 1 step.