π€ AI Summary
Large language models (LLMs) suffer from high failure rates on complex reasoning tasks and struggle to effectively leverage environmental feedback during inference. Method: This paper proposes a test-time feedback-driven iterative optimization paradigm, formalizing feedback as an online optimization process. Its core innovation is OpTuneβa learnable, test-time optimizer that jointly integrates gradient approximation and meta-learning to enable structured feedback encoding, generalization across step sizes, and lightweight, parameter-efficient tuning. Contribution/Results: Evaluated on four mainstream reasoning benchmarks, OpTune achieves an average accuracy improvement of 12.7% over strong baselines, substantially outperforming conventional retry mechanisms. It further demonstrates strong length extrapolation capability and robust cross-model transferability, establishing a novel, principled framework for test-time optimization of LLMs.
π Abstract
Solving complex tasks in a single attempt is challenging for large language models (LLMs). Iterative interaction with the environment and feedback is often required to achieve success, making effective feedback utilization a critical topic. Existing approaches either struggle with length generalization or rely on naive retries without leveraging prior information. In this paper, we introduce FTTT, a novel paradigm that formulates feedback utilization as an optimization problem at test time. Additionally, we propose a learnable test-time optimizer, OpTune, to effectively exploit feedback. Experiments on two LLMs across four reasoning datasets demonstrate that FTTT and OpTune achieve superior scalability and performance.