🤖 AI Summary
To address the low reliability and poor scalability of large language model (LLM)-based multi-agent systems in complex tasks, this paper proposes the first version control framework specifically designed for multi-agent behavioral trajectories. Inspired by Git, it introduces commit, branch, and rollback primitives to enable fine-grained agent state management, parallel exploration, and fault recovery. Methodologically, the framework builds an infrastructure layer atop LangGraph, supporting state snapshots of collaborative workflows, cross-branch trajectory comparison, and atomic rollbacks. Its core contribution lies in adapting software engineering’s version control paradigm to multi-agent systems—enabling, for the first time, safe experimentation, iterative debugging, and A/B testing of agent behaviors. Evaluated on a real-world scientific paper abstract analysis task, the framework reduces redundant computation, achieving average improvements of 32.7% in both execution time and token consumption, while concurrently enhancing system stability and collaborative efficiency.
📝 Abstract
With the rapid progress of large language models (LLMs), LLM-powered multi-agent systems (MAS) are drawing increasing interest across academia and industry. However, many current MAS frameworks struggle with reliability and scalability, especially on complex tasks. We present AgentGit, a framework that brings Git-like rollback and branching to MAS workflows. Built as an infrastructure layer on top of LangGraph, AgentGit supports state commit, revert, and branching, allowing agents to traverse, compare, and explore multiple trajectories efficiently. To evaluate AgentGit, we designed an experiment that optimizes target agents by selecting better prompts. We ran a multi-step A/B test against three baselines -- LangGraph, AutoGen, and Agno -- on a real-world task: retrieving and analyzing paper abstracts. Results show that AgentGit significantly reduces redundant computation, lowers runtime and token usage, and supports parallel exploration across multiple branches, enhancing both reliability and scalability in MAS development. This work offers a practical path to more robust MAS design and enables error recovery, safe exploration, iterative debugging, and A/B testing in collaborative AI systems.