REFLECT: Intervention-Supported Error Attribution for Silent Failures in LLM Agent Traces

📅 2026-06-08

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Large language model (LLM) agents often produce elusive silent errors in complex tasks that are difficult to localize. This work proposes a closed-loop diagnostic framework that iteratively refines error localization by controllably replaying execution trajectories, diagnosing candidate erroneous steps, applying targeted intervention patches, and leveraging outcome comparisons before and after intervention as attribution evidence. Notably, this approach is the first to directly utilize intervention feedback to refine the attribution process itself, substantially enhancing both accuracy and actionability. Evaluated on four cross-domain multi-hop reasoning benchmarks, the method achieves state-of-the-art error localization performance, particularly excelling in structured tool-use scenarios, and enables effective attribution even in the absence of ground-truth answers.

📝 Abstract

Large language model (LLM) agents now solve complex tasks through long plan-and-execution traces, yet the ability to locate errors in a completed traces still lags far behind, especially in the \emph{silent failure} regime. Existing approaches predict suspect steps via classifiers or LLM judges, or recover correct answers via retry, but none feed the intervention outcome back to \emph{refine the attribution itself}. We propose \methodname, a method that closes this gap by diagnosing a candidate error step, testing it through controlled replay with a diagnosis-specific patch, and using the verified outcome flip as contrastive evidence to refine the final attribution. Across four localization benchmarks spanning multi-hop reasoning across domains, \methodname achieves the highest localization accuracy among same-auditor methods across all four benchmarks, with the largest gains on structured tool-use traces, while providing actionable localization even when ground-truth answers are unavailable.

Problem

Research questions and friction points this paper is trying to address.

silent failure

error attribution

LLM agents

trace localization

intervention-supported diagnosis

Innovation

Methods, ideas, or system contributions that make the work stand out.

silent failure

error attribution

controlled replay