Multi$^2$: Hierarchical Multi-Agent Decision-Making with LLM-Based Agents in Interactive Environments

📅 2026-06-02

📈 Citations: 0

✨ Influential: 0

career value

218K/year

🤖 AI Summary

This work addresses the vulnerability of large language model agents to goal drift during long-horizon interactions, which undermines decision robustness. To mitigate this issue, the authors propose a hierarchical multi-agent decision framework that explicitly decouples high-level subgoal generation from low-level action execution. The high-level agent employs supervised fine-tuning to produce context-aware subgoal plans, while the low-level agent leverages offline-to-online reinforcement learning to execute atomic actions. This separation effectively curbs goal drift and substantially enhances robustness and coordination in extended tasks. Additionally, the study introduces three hierarchical decision-making benchmark datasets, filling a critical gap in training and evaluation resources for this domain. Empirical results across diverse interactive environments demonstrate consistent and significant improvements over existing baselines.

📝 Abstract

A central goal of large language model (LLM) research is to build agentic systems that can plan, act, and adapt through sustained interaction with dynamic environments. While recent LLM-based agents exhibit impressive contextual reasoning, their long-horizon decision-making remains fragile, often suffering from objective drift, where goals and plans drift over extended interactions. We introduce Multi$^2$, a hierarchical multi-agent decision-making framework that explicitly decomposes agent behavior into complementary roles. A high-level agent (System 1) focuses on context-aware sub-goal generation using supervised fine-tuning (SFT), while a low-level agent (System 2) executes atomic actions through offline-to-online reinforcement learning (RL) in interactive environments. This separation enables stable long-horizon control, mitigates objective drift, and allows efficient adaptation. Across diverse interactive environments, Multi$^2$ consistently outperforms strong agentic baselines, demonstrating improved robustness and coordination in multi-turn interaction. Beyond performance, we introduce and release three hierarchical benchmark datasets, filling a long-standing gap in training and evaluating hierarchical decision-making for LLM-based agents.

Problem

Research questions and friction points this paper is trying to address.

objective drift

long-horizon decision-making

LLM-based agents

hierarchical decision-making

interactive environments

Innovation

Methods, ideas, or system contributions that make the work stand out.

hierarchical multi-agent

LLM-based agents

objective drift mitigation