๐ค AI Summary
This work addresses the performance bottleneck commonly caused by low-level controllers due to execution failures and the absence of an integrated memory-reasoning mechanism. The authors propose a novel hierarchical agent architecture that explicitly incorporates causal reasoning into episodic memory by constructing a causal event graph, enabling robust recall under varying viewpoints. To further enhance coordination between memory and reasoning, the framework introduces an opportunistic task scheduler and a multi-scale progressive exploration strategy, effectively aligning โWhat-Where-Whenโ episodic memory with โWhich-Whyโ causal inference. Evaluated on long-horizon, sparse-reward tasks in Minecraft, the approach significantly improves both task success rates and execution efficiency, demonstrating particularly strong performance in dynamic and complex environments.
๐ Abstract
Rapid advances have been made in developing general-purpose embodied agent in environments like Minecraft through the adoption of LLM-augmented hierarchical approaches. Despite their promise, low-level controllers often become performance bottlenecks due to repeated execution failures. We argue that a key limitation is not only the lack of episodic memory, but also the decoupling of \textit{what-where-when} memory from \textit{which-why} reasoning. To address this, we propose \textbf{WISE} (Which-Why Informed Semantic Explorer), a long-horizon agent framework with an enhanced low-level controller equipped with a Causal Event Graph that augments episodic memory with explicit causal structure linking observations to task relevance. Unlike prior work such as MrSteve, which relies on feature similarity for retrieval, WISE enables robust recall under viewpoint changes and supports opportunistic task reordering through causal reasoning. Building on this memory, we propose an Opportunistic Task Scheduler that dynamically re-prioritizes subtasks when causally relevant opportunities are detected. We further equip WISE with a multi-scale progressive exploration strategy to provide spatially comprehensive observations for downstream reasoning. Experiments show that WISE largely improves task success and efficiency on long-horizon sparse tasks, particularly in settings requiring adaptive decision-making.