HIPIF: Hierarchical Planning and Information Folding for Long-Horizon LLM Agent Learning

πŸ“… 2026-06-09
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the challenge that large language models face in long-horizon, multi-turn agent tasks, where continuously growing context leads to global state tracking difficulties and long-context interference, impairing reasoning and decision-making. The authors propose an end-to-end approach that requires neither expert trajectories nor auxiliary models. Their method employs hierarchical planning to decompose tasks into explicit subgoals and integrates an information folding mechanism to compress historical information from completed subgoals, thereby mitigating interference. Additionally, they introduce hierarchical reflection and a subgoal-oriented process reward scheme to stabilize subgoal generation, transition, and execution. Evaluated on three public agent benchmarks, the approach significantly improves performance on long-horizon tasks.
πŸ“ Abstract
While Large Language Models (LLMs) have demonstrated strong capabilities as autonomous agents across a wide range of tasks, their performance often degrades in multi-turn long-horizon agentic tasks. Existing methods have made progress through fine-grained credit assignment to alleviate long-horizon sparse rewards and hierarchical reinforcement learning to decompose tasks and reduce long-term dependency. However, these methods still do not directly address long-context interference, in which continuously growing histories weaken the agent's ability to track the global task state and impair subsequent reasoning and decision-making. Inspired by the way humans handle complex tasks through subgoal decomposition and completed progress summarization, we propose Hierarchical Planning and Information Folding (HIPIF) for long-horizon LLM agent learning. HIPIF trains the agent end-to-end to organize long-horizon execution around explicit subgoals while folding completed subgoal histories to reduce long-context interference. Furthermore, to stabilize subgoal-based planning and execution, HIPIF combines hierarchical reflection and subgoal-oriented process rewards to guide subgoal generation, transition, and execution, without relying on costly auxiliary models or task-specific expert trajectories. Extensive experiments on three publicly available agentic benchmarks demonstrate the validity of our method.
Problem

Research questions and friction points this paper is trying to address.

long-horizon tasks
long-context interference
LLM agents
task state tracking
autonomous reasoning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical Planning
Information Folding
Long-Horizon Agent
Subgoal Decomposition
Process Rewards
πŸ”Ž Similar Papers