🤖 AI Summary
This paper addresses the autonomous management of complex multi-agent workflows in dynamic human-agent teams, confronting five core challenges: goal decomposition, task allocation, progress monitoring, environmental adaptability, and transparent collaboration. We formalize workflow management as a partially observable stochastic game (POSG) and identify four fundamental technical hurdles: hierarchical reasoning, multi-objective optimization, ad-hoc team coordination, and compliance-aware design. To tackle these, we propose a “manager agent” powered by large language models (e.g., GPT-5), integrating task graph generation, multi-agent collaborative planning, and an MA-Gym-based simulation evaluation framework. Empirical evaluation across 20 real-world workflows reveals a significant trade-off among goal completion rate, constraint adherence, and runtime efficiency in existing approaches—thereby empirically validating problem hardness. Our work establishes a reproducible benchmark for trustworthy autonomous workflow management and provides concrete pathways for systematic improvement.
📝 Abstract
While agentic AI has advanced in automating individual tasks, managing complex multi-agent workflows remains a challenging problem. This paper presents a research vision for autonomous agentic systems that orchestrate collaboration within dynamic human-AI teams. We propose the Autonomous Manager Agent as a core challenge: an agent that decomposes complex goals into task graphs, allocates tasks to human and AI workers, monitors progress, adapts to changing conditions, and maintains transparent stakeholder communication. We formalize workflow management as a Partially Observable Stochastic Game and identify four foundational challenges: (1) compositional reasoning for hierarchical decomposition, (2) multi-objective optimization under shifting preferences, (3) coordination and planning in ad hoc teams, and (4) governance and compliance by design. To advance this agenda, we release MA-Gym, an open-source simulation and evaluation framework for multi-agent workflow orchestration. Evaluating GPT-5-based Manager Agents across 20 workflows, we find they struggle to jointly optimize for goal completion, constraint adherence, and workflow runtime - underscoring workflow management as a difficult open problem. We conclude with organizational and ethical implications of autonomous management systems.