ReVISE: Learning to Refine at Test-Time via Intrinsic Self-Verification

📅 2025-02-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the fundamental limitation of large language models (LLMs) in lacking intrinsic self-correction capabilities during inference. To this end, we propose a test-time self-correction framework that requires no external models or reinforcement learning. Methodologically: (1) we introduce an intrinsic self-verification–driven refinement mechanism that enables real-time evaluation and dynamic reconstruction of reasoning paths; (2) we design a structured curriculum training paradigm based on online preference learning to enhance the generalizability of correction policies; and (3) we incorporate confidence-aware decoding alongside a reasoning-path sampling-and-comparison mechanism to improve correction reliability. Evaluated across diverse complex reasoning benchmarks, our approach achieves substantial gains in answer accuracy while maintaining efficiency, lightweight implementation, and end-to-end intrinsic verification and correction. It establishes a novel paradigm for developing LLMs with human-like reflective reasoning abilities.

Technology Category

Application Category

📝 Abstract
Self-awareness, i.e., the ability to assess and correct one's own generation, is a fundamental aspect of human intelligence, making its replication in large language models (LLMs) an important yet challenging task. Previous works tackle this by employing extensive reinforcement learning or rather relying on large external verifiers. In this work, we propose Refine via Intrinsic Self-Verification (ReVISE), an efficient and effective framework that enables LLMs to self-correct their outputs through self-verification. The core idea of ReVISE is to enable LLMs to verify their reasoning processes and continually rethink reasoning trajectories based on its verification. We introduce a structured curriculum based upon online preference learning to implement this efficiently. Specifically, as ReVISE involves two challenging tasks (i.e., self-verification and reasoning correction), we tackle each task sequentially using curriculum learning, collecting both failed and successful reasoning paths to construct preference pairs for efficient training. During inference, our approach enjoys natural test-time scaling by integrating self-verification and correction capabilities, further enhanced by our proposed confidence-aware decoding mechanism. Our experiments on various reasoning tasks demonstrate that ReVISE achieves efficient self-correction and significantly improves reasoning performance.
Problem

Research questions and friction points this paper is trying to address.

Enable LLMs to self-correct outputs
Implement self-verification in reasoning
Improve reasoning via curriculum learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Intrinsic self-verification framework
Curriculum learning for correction
Confidence-aware decoding mechanism
🔎 Similar Papers
No similar papers found.