Learning to Reason from Feedback at Test-Time

πŸ“… 2025-02-16
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Large language models (LLMs) suffer from high failure rates on complex reasoning tasks and struggle to effectively leverage environmental feedback during inference. Method: This paper proposes a test-time feedback-driven iterative optimization paradigm, formalizing feedback as an online optimization process. Its core innovation is OpTuneβ€”a learnable, test-time optimizer that jointly integrates gradient approximation and meta-learning to enable structured feedback encoding, generalization across step sizes, and lightweight, parameter-efficient tuning. Contribution/Results: Evaluated on four mainstream reasoning benchmarks, OpTune achieves an average accuracy improvement of 12.7% over strong baselines, substantially outperforming conventional retry mechanisms. It further demonstrates strong length extrapolation capability and robust cross-model transferability, establishing a novel, principled framework for test-time optimization of LLMs.

Technology Category

Application Category

πŸ“ Abstract
Solving complex tasks in a single attempt is challenging for large language models (LLMs). Iterative interaction with the environment and feedback is often required to achieve success, making effective feedback utilization a critical topic. Existing approaches either struggle with length generalization or rely on naive retries without leveraging prior information. In this paper, we introduce FTTT, a novel paradigm that formulates feedback utilization as an optimization problem at test time. Additionally, we propose a learnable test-time optimizer, OpTune, to effectively exploit feedback. Experiments on two LLMs across four reasoning datasets demonstrate that FTTT and OpTune achieve superior scalability and performance.
Problem

Research questions and friction points this paper is trying to address.

Enhancing feedback utilization in LLMs
Optimizing iterative task-solving strategies
Improving scalability and performance in reasoning tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Optimizes feedback at test-time
Learnable test-time optimizer
Superior scalability and performance
πŸ”Ž Similar Papers
No similar papers found.