MAP: A Map-then-Act Paradigm for Long-Horizon Interactive Agent Reasoning

📅 2026-05-13
📈 Citations: 0
Influential: 0
📄 PDF

career value

206K/year
🤖 AI Summary
Current interactive large language model agents often rely on goal-conditioned step-by-step planning, resulting in passive environmental perception and inefficient trial-and-error behavior. This work proposes the Map-then-Act (MAP) paradigm, which introduces cognitive mapping theory into interactive agents for the first time. MAP enables proactive reasoning through a three-stage process: global exploration, task-specific map construction, and knowledge-augmented execution—effectively decoupling environment understanding from immediate action. We develop a modular MAP framework and release the MAP-2K dataset to empirically demonstrate that comprehensive environmental understanding fundamentally outperforms behavioral imitation. Evaluated on benchmarks such as ARC-AGI-3, MAP substantially improves performance: state-of-the-art models surpass near-zero baselines in 22 out of 25 environments, and training on MAP-2K yields better results than expert trajectory imitation.
📝 Abstract
Current interactive LLM agents rely on goal-conditioned stepwise planning, where environmental understanding is acquired reactively during execution rather than established beforehand. This temporal inversion leads to Delayed Environmental Perception: agents must infer environmental constraints through trial-and-error, resulting in an Epistemic Bottleneck that traps them in inefficient failure cycles. Inspired by human affordance perception and cognitive map theory, we propose the Map-then-Act Paradigm (MAP), a plug-and-play framework that shifts environment understanding before execution. MAP consists of three stages: (1) Global Exploration, acquiring environment-general priors; (2) Task-Specific Mapping, constructing a structured cognitive map; and (3) Knowledge-Augmented Execution, solving tasks grounded on the map. Experiments show consistent gains across benchmarks and LLMs. On ARC-AGI-3, MAP enables frontier models to surpass near-zero baseline performance in 22 of 25 game environments. We further introduce MAP-2K, a dataset of map-then-act trajectories, and show that training on it outperforms expert execution traces, suggesting that understanding environments is more fundamental than imitation.
Problem

Research questions and friction points this paper is trying to address.

Delayed Environmental Perception
Epistemic Bottleneck
Interactive Agent Reasoning
Long-Horizon Planning
Cognitive Map
Innovation

Methods, ideas, or system contributions that make the work stand out.

Map-then-Act Paradigm
cognitive map
environmental perception
long-horizon reasoning
interactive agents
Y
Yuxin Liu
University of Science and Technology of China, Meituan
Z
Ziang Ye
University of Science and Technology of China, Meituan
Y
Yueqing Sun
Meituan
M
Mingye Zhu
University of Science and Technology of China
J
Jinwei Xiao
Meituan, Institution of Automation, Chinese Academy of Sciences
Z
Zhuowen Han
Meituan, Tianjin University
Qi GU
Qi GU
PhD, FRSC, Professor | Chinese Academy of Sciences
Bioinspired EngineeringBiofabrication3D printingBiomedical MaterialsBiomechanics
X
Xunliang Cai
Meituan
L
Lei Zhang
University of Science and Technology of China