Learning to Reason from Feedback at Test-Time

📅 2025-02-16

📈 Citations: 0

✨ Influential: 0

career value

161K/year

🤖 AI Summary

Large language models (LLMs) suffer from high failure rates on complex reasoning tasks and struggle to effectively leverage environmental feedback during inference. Method: This paper proposes a test-time feedback-driven iterative optimization paradigm, formalizing feedback as an online optimization process. Its core innovation is OpTune—a learnable, test-time optimizer that jointly integrates gradient approximation and meta-learning to enable structured feedback encoding, generalization across step sizes, and lightweight, parameter-efficient tuning. Contribution/Results: Evaluated on four mainstream reasoning benchmarks, OpTune achieves an average accuracy improvement of 12.7% over strong baselines, substantially outperforming conventional retry mechanisms. It further demonstrates strong length extrapolation capability and robust cross-model transferability, establishing a novel, principled framework for test-time optimization of LLMs.

Technology Category

Application Category

📝 Abstract

Solving complex tasks in a single attempt is challenging for large language models (LLMs). Iterative interaction with the environment and feedback is often required to achieve success, making effective feedback utilization a critical topic. Existing approaches either struggle with length generalization or rely on naive retries without leveraging prior information. In this paper, we introduce FTTT, a novel paradigm that formulates feedback utilization as an optimization problem at test time. Additionally, we propose a learnable test-time optimizer, OpTune, to effectively exploit feedback. Experiments on two LLMs across four reasoning datasets demonstrate that FTTT and OpTune achieve superior scalability and performance.

Problem

Research questions and friction points this paper is trying to address.

Enhancing feedback utilization in LLMs

Optimizing iterative task-solving strategies

Improving scalability and performance in reasoning tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Optimizes feedback at test-time

Learnable test-time optimizer

Superior scalability and performance

🔎 Similar Papers

Can Feedback Enhance Semantic Grounding in Large Vision-Language Models?