MAP: A Map-then-Act Paradigm for Long-Horizon Interactive Agent Reasoning

📅 2026-05-13

📈 Citations: 0

✨ Influential: 0

career value

206K/year

🤖 AI Summary

Current interactive large language model agents often rely on goal-conditioned step-by-step planning, resulting in passive environmental perception and inefficient trial-and-error behavior. This work proposes the Map-then-Act (MAP) paradigm, which introduces cognitive mapping theory into interactive agents for the first time. MAP enables proactive reasoning through a three-stage process: global exploration, task-specific map construction, and knowledge-augmented execution—effectively decoupling environment understanding from immediate action. We develop a modular MAP framework and release the MAP-2K dataset to empirically demonstrate that comprehensive environmental understanding fundamentally outperforms behavioral imitation. Evaluated on benchmarks such as ARC-AGI-3, MAP substantially improves performance: state-of-the-art models surpass near-zero baselines in 22 out of 25 environments, and training on MAP-2K yields better results than expert trajectory imitation.

📝 Abstract

Current interactive LLM agents rely on goal-conditioned stepwise planning, where environmental understanding is acquired reactively during execution rather than established beforehand. This temporal inversion leads to Delayed Environmental Perception: agents must infer environmental constraints through trial-and-error, resulting in an Epistemic Bottleneck that traps them in inefficient failure cycles. Inspired by human affordance perception and cognitive map theory, we propose the Map-then-Act Paradigm (MAP), a plug-and-play framework that shifts environment understanding before execution. MAP consists of three stages: (1) Global Exploration, acquiring environment-general priors; (2) Task-Specific Mapping, constructing a structured cognitive map; and (3) Knowledge-Augmented Execution, solving tasks grounded on the map. Experiments show consistent gains across benchmarks and LLMs. On ARC-AGI-3, MAP enables frontier models to surpass near-zero baseline performance in 22 of 25 game environments. We further introduce MAP-2K, a dataset of map-then-act trajectories, and show that training on it outperforms expert execution traces, suggesting that understanding environments is more fundamental than imitation.

Problem

Research questions and friction points this paper is trying to address.

Delayed Environmental Perception

Epistemic Bottleneck

Interactive Agent Reasoning

Long-Horizon Planning

Cognitive Map

Innovation

Methods, ideas, or system contributions that make the work stand out.

Map-then-Act Paradigm

cognitive map

environmental perception