A Subgoal-driven Framework for Improving Long-Horizon LLM Agents

📅 2026-03-20

📈 Citations: 0

✨ Influential: 0

career value

231K/year

🤖 AI Summary

This work addresses the challenges faced by large language model (LLM) agents in long-horizon tasks such as web navigation, where dynamic environmental changes can lead to goal drift and sparse, delayed rewards impede effective learning. To enhance long-term planning and execution capabilities, the authors propose an online planning mechanism that explicitly decomposes high-level goals into subgoals at inference time, integrated within a milestone-based reinforcement learning framework termed MiRA. This approach leverages dense intermediate reward signals derived from progress toward predefined milestones. Evaluated on the WebArena-Lite benchmark, the method improves task success rates by approximately 10% for Gemini and dramatically elevates Gemma3-12B’s performance from 6.4% to 43.0%, surpassing GPT-4-Turbo, GPT-4o, and the current state-of-the-art open-source model, WebRL.

Technology Category

Application Category

📝 Abstract

Large language model (LLM)-based agents have emerged as powerful autonomous controllers for digital environments, including mobile interfaces, operating systems, and web browsers. Web navigation, for example, requires handling dynamic content and long sequences of actions, making it particularly challenging. Existing LLM-based agents struggle with long-horizon planning in two main ways. During online execution, they often lose track as new information arrives, lacking a clear and adaptive path toward the final goal. This issue is further exacerbated during reinforcement learning (RL) fine-tuning, where sparse and delayed rewards make it difficult for agents to identify which actions lead to success, preventing them from maintaining coherent reasoning over extended tasks. To address these challenges, we propose two contributions. First, we introduce an agent framework that leverages proprietary models for online planning through subgoal decomposition. Second, we present MiRA (Milestoning your Reinforcement Learning Enhanced Agent), an RL training framework that uses dense, milestone-based reward signals. The real-time planning mechanism improves proprietary models such as Gemini by approximately a 10% absolute increase in success rate (SR) on the WebArena-Lite benchmark. Meanwhile, applying MiRA to the open Gemma3-12B model increases its success rate from 6.4% to 43.0%. This performance surpasses proprietary systems such as GPT-4-Turbo (17.6%) and GPT-4o (13.9%), as well as the previous open-model state of the art, WebRL (38.4%). Overall, our findings demonstrate that combining explicit inference-time planning with milestone-based rewards significantly improves an agent's long-horizon capabilities, paving the way for more robust and general-purpose autonomous systems.

Problem

Research questions and friction points this paper is trying to address.

long-horizon planning

LLM agents

web navigation

sparse rewards

goal tracking

Innovation

Methods, ideas, or system contributions that make the work stand out.

subgoal decomposition

milestone-based rewards

long-horizon planning