🤖 AI Summary
Large language model (LLM) agents often produce elusive silent errors in complex tasks that are difficult to localize. This work proposes a closed-loop diagnostic framework that iteratively refines error localization by controllably replaying execution trajectories, diagnosing candidate erroneous steps, applying targeted intervention patches, and leveraging outcome comparisons before and after intervention as attribution evidence. Notably, this approach is the first to directly utilize intervention feedback to refine the attribution process itself, substantially enhancing both accuracy and actionability. Evaluated on four cross-domain multi-hop reasoning benchmarks, the method achieves state-of-the-art error localization performance, particularly excelling in structured tool-use scenarios, and enables effective attribution even in the absence of ground-truth answers.
📝 Abstract
Large language model (LLM) agents now solve complex tasks through long plan-and-execution traces, yet the ability to locate errors in a completed traces still lags far behind, especially in the \emph{silent failure} regime. Existing approaches predict suspect steps via classifiers or LLM judges, or recover correct answers via retry, but none feed the intervention outcome back to \emph{refine the attribution itself}. We propose \methodname, a method that closes this gap by diagnosing a candidate error step, testing it through controlled replay with a diagnosis-specific patch, and using the verified outcome flip as contrastive evidence to refine the final attribution. Across four localization benchmarks spanning multi-hop reasoning across domains, \methodname achieves the highest localization accuracy among same-auditor methods across all four benchmarks, with the largest gains on structured tool-use traces, while providing actionable localization even when ground-truth answers are unavailable.