OpenHA: A Series of Open-Source Hierarchical Agentic Models in Minecraft

📅 2025-09-12

📈 Citations: 0

✨ Influential: 0

career value

219K/year

🤖 AI Summary

In open-world environments, systematic approaches to action-space abstraction and tokenization for Vision-Language-Action (VLA) agents remain lacking. This paper addresses this gap for Minecraft-based VLA agents by proposing a unified hierarchical agent framework centered on the “Chain of Action” paradigm: abstract actions are modeled as learnable reasoning steps—not predefined sub-policy instructions—enabling end-to-end integration of high-level planning and low-level execution. Methodologically, we integrate multi-granularity action-space modeling, task-aware tokenization, and joint large-scale behavioral cloning. Evaluated on a benchmark of 800+ diverse tasks, our All-in-One agent significantly outperforms specialized baselines, establishing new state-of-the-art performance. We publicly release the full model, toolchain, and dataset to foster reproducible research and community advancement.

Technology Category

Application Category

📝 Abstract

The choice of action spaces is a critical yet unresolved challenge in developing capable, end-to-end trainable agents. This paper first presents a large-scale, systematic comparison of prominent abstracted action spaces and tokenizers for Vision-Language-Action (VLA) or hierarchical agent models in the open-ended Minecraft. Our analysis reveals that no single action space is universally optimal; instead, the most effective abstraction is highly task-dependent, creating a dilemma for building generalist agents. To resolve this, we introduce Chain of Action (CoA), a novel framework that unifies high-level planning and low-level control within a single, monolithic VLA model. CoA treats an abstracted action not as a command for a separate policy, but as an intermediate reasoning step--akin to a chain of thought--that guides the generation of the final, executable action. Furthermore, we demonstrate that an All-in-One agent trained on a diverse mixture of action spaces using the CoA paradigm learns a more robust and generalizable policy. This unified agent achieves a new state-of-the-art, improving the overall task success rate over strong, specialized baselines. To foster reproducible research, we release the OpenHA (Open Hierarchical Agents) suite, which includes our comprehensive benchmark of over 800 distinct tasks, curated datasets, source code, and all pretrained model checkpoints at https://github.com/CraftJarvis/OpenHA

Problem

Research questions and friction points this paper is trying to address.

Comparing action spaces and tokenizers for Minecraft agents

Resolving task-dependent optimality in generalist agent design

Unifying high-level planning and low-level control in VLA models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified CoA framework for hierarchical planning

Treats actions as intermediate reasoning steps

Trains on diverse action spaces mixture

🔎 Similar Papers

Odyssey: Empowering Minecraft Agents with Open-World Skills