exttt{ReMind}: Understanding Deductive Code Reasoning in LLMs

📅 2025-11-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) exhibit significant limitations in deductive code reasoning—i.e., precisely tracking program execution and state evolution—due to generation bias, misalignment between reasoning and execution capabilities, and poor zero-shot generalization. To address these issues, we propose ReMind, a multi-agent framework comprising three specialized agents: (1) the Mutator, which generates semantically equivalent code variants to mitigate source-code bias; (2) the Executor, which performs step-by-step execution and monitors variable states to expose reasoning inconsistencies; and (3) the Inspector, which identifies erroneous reasoning steps and refines control-flow logic. These agents jointly enable dynamic correction and controllable optimization of the reasoning process. Extensive experiments across two code-reasoning benchmarks and five mainstream LLMs demonstrate that ReMind substantially improves deductive reasoning accuracy and exhibits strong zero-shot generalization. To our knowledge, this is the first work to systematically integrate execution-feedback loops into LLM-based code reasoning enhancement.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) have achieved remarkable progress in code-related tasks. Despite their advancement, empirical evidence reveals that they still struggle with emph{deductive code reasoning}, the ability to reason about the program execution process. While prior studies have recognized this limitation, the underlying causes remain largely underexplored. In this paper, we begin by presenting a comprehensive empirical study that reveals three key challenges undermining deductive code reasoning: (1) an intrinsic gap between generation and reasoning abilities, (2) a consistent bias towards code sources, and (3) weak zero-shot generalization on complex benchmarks. In light of these challenges, we propose exttt{ReMind}, a multi-agent framework composed of exttt{Mutator}, exttt{Executor}, and exttt{Inspector}. The exttt{Mutator} generates code variants to mitigate bias towards code sources, the exttt{Executor} traces variable states step-by-step to expose inconsistency, and the exttt{Inspector} identifies problematic reasoning steps and provides control-flow refinement to bridge the intrinsic reasoning gap. Through their coordinated collaboration, exttt{ReMind} systematically identifies and refines reasoning flaws, achieving outstanding performance and enabling robust zero-shot generalization. Extensive experiments on two benchmarks with five LLMs demonstrate the superior advantages of exttt{ReMind} compared to baseline approaches in deductive code reasoning.
Problem

Research questions and friction points this paper is trying to address.

Addressing deductive code reasoning limitations in LLMs
Mitigating bias towards code sources in reasoning
Improving zero-shot generalization on complex benchmarks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Mutator generates code variants to mitigate bias
Executor traces variable states step-by-step
Inspector identifies problematic reasoning steps for refinement
🔎 Similar Papers
No similar papers found.
J
Jun Gao
Zhejiang University, China
Y
Yun Peng
The Chinese University of Hong Kong, China
Xiaoxue Ren
Xiaoxue Ren
Zhejiang University
Software Engineering