Cognition to Control - Multi-Agent Learning for Human-Humanoid Collaborative Transport

📅 2026-03-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of translating high-level human intentions into stable, safe, and adaptive whole-body motion control in human–robot collaboration. The authors propose a novel three-layer cognitive-control architecture that explicitly integrates System 2–style deliberative reasoning with System 1–style rapid reactive mechanisms—a first in this domain. The framework synergistically combines vision-language models, multi-agent reinforcement learning under decentralized Markov potential games, and whole-body dynamics control, enabling role-agnostic, adaptive coordination without predefined roles. A residual policy further internalizes the human partner’s dynamics to enhance responsiveness. Evaluated on cooperative object-carrying tasks, the system significantly outperforms both single-agent and end-to-end baselines, demonstrating superior success rates, robustness, and the emergence of spontaneous leader–follower behaviors.

Technology Category

Application Category

📝 Abstract
Effective human-robot collaboration (HRC) requires translating high-level intent into contact-stable whole-body motion while continuously adapting to a human partner. Many vision-language-action (VLA) systems learn end-to-end mappings from observations and instructions to actions, but they often emphasize reactive (System 1-like) behavior and leave under-specified how sustained System 2-style deliberation can be integrated with reliable, low-latency continuous control. This gap is acute in multi-agent HRC, where long-horizon coordination decisions and physical execution must co-evolve under contact, feasibility, and safety constraints. We address this limitation with cognition-to-control (C2C), a three-layer hierarchy that makes the deliberation-to-control pathway explicit: (i) a VLM-based grounding layer that maintains persistent scene referents and infers embodiment-aware affordances/constraints; (ii) a deliberative skill/coordination layer-the System 2 core-that optimizes long-horizon skill choices and sequences under human-robot coupling via decentralized MARL cast as a Markov potential game with a shared potential encoding task progress; and (iii) a whole-body control layer that executes the selected skills at high frequency while enforcing kinematic/dynamic feasibility and contact stability. The deliberative layer is realized as a residual policy relative to a nominal controller, internalizing partner dynamics without explicit role assignment. Experiments on collaborative manipulation tasks show higher success and robustness than single-agent and end-to-end baselines, with stable coordination and emergent leader-follower behaviors.
Problem

Research questions and friction points this paper is trying to address.

human-robot collaboration
multi-agent learning
whole-body control
deliberative reasoning
collaborative transport
Innovation

Methods, ideas, or system contributions that make the work stand out.

Cognition-to-Control
Vision-Language-Action
Multi-Agent Reinforcement Learning
Markov Potential Game
Whole-Body Control
🔎 Similar Papers
No similar papers found.
H
Hao Zhang
Department of Electrical Engineering, the University of Texas at Arlington, 76010 Arlington, USA; Department of Mechanical Engineering, Carnegie Mellon University, 15213 Pittsburgh, USA
Ding Zhao
Ding Zhao
Carnegie Mellon University
Trustworthy AIAI safetyreinforcement learningautonomous vehiclesrobotics
H. Eric Tseng
H. Eric Tseng
Uni. of Texas at Arlington
Automotive Control