Reducing Cognitive Load in Multi-Agent Reinforcement Learning for Mathematical Problem Solving: Decoupling Reasoning and Code Generation

📅 2025-08-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In single-agent mathematical reasoning systems, tight coupling between reasoning and code generation imposes excessive cognitive load, hindering robust inference. Method: This paper proposes the first decoupled two-agent collaborative framework, wherein distinct agents specialize in reasoning and code generation respectively—enabling problem decomposition, isolated code execution, and reward shaping tailored to each agent’s role. The framework jointly optimizes both agents via imitation learning and advantage-based reinforcement learning. Results: Compared to single-agent baselines, our approach significantly increases the proportion of correct reasoning trajectories, improves accuracy on multi-step mathematical problems, and enhances training stability. Its core contribution is the first instantiation of a decoupled, collaborative, and credit-assignable two-agent architecture—establishing a more robust paradigm for complex reasoning tasks.

Technology Category

Application Category

📝 Abstract
Current tool-integrated mathematical reasoning systems often adopt a single-agent paradigm, where one large language model handles problem reasoning, code generation, and code execution in an integrated workflow. While this design eases coordination, we hypothesize that it imposes cognitive load interference, as the agent must interleave long-horizon reasoning with precise program synthesis. We validate this hypothesis through a controlled comparison between a reasoning-only agent and a reasoning-plus-code agent, finding that the latter produces significantly fewer correct reasoning paths despite having tool-calling capabilities. To address this, we propose a dual-agent hybrid framework: a Reasoning Agent performs stepwise problem decomposition, and a Code Agent handles code generation and execution. Training combines imitation learning and reinforcement learning: the Code Agent receives strong rewards for matching intermediate ground-truth programs and weaker rewards for valid execution, while the Reasoning Agent is optimized chiefly via final-answer accuracy using advantage estimation to credit intermediate steps. This decoupled role design reduces cognitive interference and promotes stable reasoning-coding coordination.
Problem

Research questions and friction points this paper is trying to address.

Reducing cognitive load in multi-agent reinforcement learning
Decoupling reasoning and code generation tasks
Improving accuracy in mathematical problem solving
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dual-agent hybrid framework for reasoning and coding
Combines imitation and reinforcement learning techniques
Decouples reasoning from code generation to reduce interference
🔎 Similar Papers
No similar papers found.
Dayu Wang
Dayu Wang
Graduate Peking University
AI
J
Jiaye Yang
Baidu Inc.
W
Weikang Li
Peking University
J
Jiahui Liang
Baidu Inc.
Y
Yang Li
Baidu Inc.