🤖 AI Summary
This work addresses the computational inefficiency of large language model agents in multi-turn decision-making, which often rely on lengthy and uniformly applied explicit chain-of-thought reasoning. The authors propose a dual-mode reasoning framework that employs implicit reasoning for routine steps and adaptively switches to explicit chain-of-thought reasoning when deeper deliberation is required. This approach pioneers the dynamic integration of implicit and explicit reasoning, learning a policy for mode selection supervised by task success. It combines action-supervised implicit learning, an adaptive switching mechanism, and latent variable modeling. Evaluated on search and tool-use tasks, the framework achieves comparable or higher accuracy while reducing generated tokens by up to 43.6% and 84.6%, respectively, substantially improving the trade-off between efficiency and performance.
📝 Abstract
Large reasoning models improve performance by generating extended chain-of-thought (CoT) reasoning, but this behavior becomes inefficient when applied to LLM agents. Current LLM agents often generate verbose textual reasoning at every decision step and allocate reasoning effort nearly uniformly across turns, leading to substantial inefficiency in multi-turn agentic trajectories. We propose Adaptive Latent Agentic Reasoning (ALAR), a dual-mode framework that uses compact latent reasoning for routine turns and selectively escalates to explicit chain-of-thought when deeper deliberation is needed. ALAR learns latent reasoning by using the agent's actions as supervision anchors and is further optimized to use latent reasoning when it is sufficient for task success and reserve explicit CoT for harder decisions. Experiments on agentic search and tool-use benchmarks show that ALAR maintains comparable or better task accuracy while substantially reducing generated tokens by up to 43.6% in search and 84.6% in tool use. These results demonstrate that ALAR improves the accuracy-efficiency trade-off of LLM agents by reducing unnecessary textual reasoning while preserving explicit deliberation for harder decision steps.