🤖 AI Summary
In multi-agent systems, LLM-based error attribution suffers from inaccurate agent- and step-level fault localization and inconsistent results. To address these challenges, we propose ECHO—a novel error attribution algorithm featuring three key innovations: (1) a hierarchical context structure that disentangles interaction trajectories by positional encoding; (2) a goal-consistency evaluation criterion that models task completion rather than mere output correctness; and (3) a multi-agent consensus voting mechanism to mitigate bias under strongly coupled interactions. Extensive experiments across diverse complex multi-agent reasoning tasks demonstrate that ECHO significantly outperforms baselines—including one-shot evaluation, stepwise analysis, and binary search—achieving 27.4%–41.8% absolute gains in attribution accuracy for fine-grained reasoning errors and high-dependency scenarios. Moreover, ECHO exhibits strong cross-task generalization capability, validating its robustness and scalability in realistic multi-agent settings.
📝 Abstract
Error attribution in Large Language Model (LLM) multi-agent systems presents a significant challenge in debugging and improving collaborative AI systems. Current approaches to pinpointing agent and step level failures in interaction traces - whether using all-at-once evaluation, step-by-step analysis, or binary search - fall short when analyzing complex patterns, struggling with both accuracy and consistency. We present ECHO (Error attribution through Contextual Hierarchy and Objective consensus analysis), a novel algorithm that combines hierarchical context representation, objective analysis-based evaluation, and consensus voting to improve error attribution accuracy. Our approach leverages a positional-based leveling of contextual understanding while maintaining objective evaluation criteria, ultimately reaching conclusions through a consensus mechanism. Experimental results demonstrate that ECHO outperforms existing methods across various multi-agent interaction scenarios, showing particular strength in cases involving subtle reasoning errors and complex interdependencies. Our findings suggest that leveraging these concepts of structured, hierarchical context representation combined with consensus-based objective decision-making, provides a more robust framework for error attribution in multi-agent systems.