🤖 AI Summary
To address key bottlenecks in complex robotic manipulation—namely, weak long-horizon dependency modeling, difficulty in multi-agent coordination, and insufficient dynamic long-horizon planning—this paper proposes a brain-inspired hierarchical embodied agent swarm. Methodologically, it introduces, for the first time, a neurocircuit-driven reflex module and a dynamically specialized multi-agent collaboration mechanism, integrating multimodal vision-language-action (VLA) modeling, neuromorphic functional modules, decentralized architecture, and hierarchical decision-making and reasoning to enable task-complexity-adaptive collective intelligence scaling. Experiments demonstrate a 23% reduction in average long-horizon task completion time; moreover, on multi-path complex tasks, the framework achieves the first reported non-zero success rates of 12–31%, significantly outperforming state-of-the-art VLA models.
📝 Abstract
Recent advances in multimodal vision-language-action (VLA) models have revolutionized traditional robot learning, enabling systems to interpret vision, language, and action in unified frameworks for complex task planning. However, mastering complex manipulation tasks remains an open challenge, constrained by limitations in persistent contextual memory, multi-agent coordination under uncertainty, and dynamic long-horizon planning across variable sequences. To address this challenge, we propose extbf{HiBerNAC}, a extbf{Hi}erarchical extbf{B}rain- extbf{e}mulated extbf{r}obotic extbf{N}eural extbf{A}gent extbf{C}ollective, inspired by breakthroughs in neuroscience, particularly in neural circuit mechanisms and hierarchical decision-making. Our framework combines: (1) multimodal VLA planning and reasoning with (2) neuro-inspired reflection and multi-agent mechanisms, specifically designed for complex robotic manipulation tasks. By leveraging neuro-inspired functional modules with decentralized multi-agent collaboration, our approach enables robust and enhanced real-time execution of complex manipulation tasks. In addition, the agentic system exhibits scalable collective intelligence via dynamic agent specialization, adapting its coordination strategy to variable task horizons and complexity. Through extensive experiments on complex manipulation tasks compared with state-of-the-art VLA models, we demonstrate that extbf{HiBerNAC} reduces average long-horizon task completion time by 23%, and achieves non-zero success rates (12 extendash 31%) on multi-path tasks where prior state-of-the-art VLA models consistently fail. These results provide indicative evidence for bridging biological cognition and robotic learning mechanisms.